Blogs & Announcements

Arm and Microsoft Collaboration Supercharges AI Experiences for Applications on Arm-based PC and Mobile Devices

Arm KleidiAI integration in ONNX Runtime expands AI performance optimizations across Windows and Android operating systems, leading to up to 2.6x faster AI inference for accelerated application experiences.

Arm and Microsoft collaboration for AI acceleration on PC and mobile devices

16th May, 2025

Enhancing DeepSeek R1 performance for on-device inference with ONNX Runtime.

Enhance your AI inferencing performance with DeepSeek R1 optimized for on-device use via ONNX Runtime! This blog explores how to efficiently run DeepSeek models across NPUs, GPUs, and CPUs, achieving up to 6.3x speed improvements over PyTorch. Learn how to convert, quantize, and fine-tune these models using the Olive framework and Azure AI Foundry.

DeepSeek R1 On Device using ONNX Runtime Gen AI

19th February, 2025

Cross-Platform Edge AI Made Easy with ONNX Runtime

Driven by the growing demand for user privacy, real-time performance, and cost efficiency, edge AI is transforming the AI landscape. At Ignite, we're excited to announce four new features in the ONNX Runtime ecosystem designed to make edge AI more accessible.

19th November, 2024

Announcing MultiLoRA with ONNX Runtime: Revolutionizing AI Customization

MultiLoRA with ONNX Runtime brings flexible, efficient AI customization by enabling easy integration of LoRA adapters for dynamic, personalized models with minimal resource demands.

20th November, 2024

Is it better to quantize before or after finetuning?

Learn how to quickly and easily experiment in your model optimization workflow using Olive.

19th November, 2024

Scribble to Erase on Goodnotes for Windows, Web, and Android, Powered by ONNX Runtime

Discover how Goodnotes brings the popular scribble-to-erase feature from iPad to Windows, Web, and Android with the help of ONNX Runtime, enabling seamless, high-performance AI integration across platforms.

18th November, 2024

Democratizing AI Model optimization with the new Olive CLI

Learn how to use the new Olive CLI to easily optimize AI Models for on-device inference

November 11th, 2024

Enhancing team collaboration during AI model optimization with the Olive Shared Cache

Learn how to use Olive's shared cache to enhance team collaboration when optimizing AI models

October 30th, 2024

Accelerating LightGlue Inference with ONNX Runtime and TensorRT

Outperform torch.compile significantly using ONNX Runtime with TensorRT for LightGlue inference.

July 17th, 2024

High performance on-device real-time ML with NimbleEdge, using ONNX Runtime

Using NimbleEdge with ONNX Runtime delivers millisecond latency and minimal resource use, enabling real-time and privacy-preserving personalization in mobile apps.

June 17th, 2024

Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly leads to 20x speedup over multi-threaded and 550x speedup over single-threaded CPU performance. Thus achieving interactive speeds for state-of-the-art background removal directly in the browser.

June 12th, 2024

Phi-3 Small and Medium Models are now Optimized with ONNX Runtime and DirectML

You can now run the Phi-3 medium, small models on device of your choice.

May 21th, 2024

Enjoy the Power of Phi-3 with ONNX Runtime on your device

Harness ONNX Runtime to run Phi-3-mini on mobile phones and in the browser.

May 20th, 2024

ONNX Runtime supports Phi-3 mini models across platforms and devices

You can now run Microsoft's latest home-grown Phi-3 models across a huge range of devices and platforms thanks to ONNX Runtime and DirectML.

April 22nd, 2024

ONNX Runtime Web unleashes generative AI in the browser using WebGPU

We are thrilled to announce the official launch of ONNX Runtime Web featuring WebGPU, which is now available in the ONNX Runtime 1.17 release.

February 29th, 2024

ONNX Runtime 1.17: CUDA 12 support, Phi-2 optimizations, WebGPU, and more!

From Phi-2 model optimizations to CUDA 12 support, read this post to learn more about some of the exciting new functionality introduced in the ONNX Runtime 1.17 release.

February 28th, 2024

Accelerating Phi-2, CodeLlama, Gemma and other Gen AI models with ONNX Runtime

Improvements with ONNX Runtime for inferencing popular Gen AI models.

February 26th, 2024

On-Device Training: Training a model in browser

Want to do ML training for your website in-browser? Learn more about what web training with ONNX Runtime has to offer in our blog below and experiment with your own applications through our easy-to-follow tutorials and demo.

February 6th, 2024

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

With ONNX Runtime and Olive, users can easily accelerate SD Turbo and SDXL Turbo models to generate viable images in as little as one step!

January 15th, 2024

Accelerating LLaMA-2 Inference with ONNX Runtime

Learn how ONNX Runtime can speed up LLaMA-2 inference by up to 4.5X

November 14th, 2023

Run PyTorch models on the edge

Everything you need to know about running PyTorch models on the edge with ONNX Runtime.

October 12th, 2023

Accelerating over 130,000 Hugging Face models with ONNX Runtime

Learn more on how ONNX Runtime helps users accelerate open source machine learning models from Hugging Face.

October 4th, 2023

On-Device Training with ONNX Runtime: A deep dive

This blog presents technical details of On-Device training with ONNX Runtime. It explains how On-Device Training works and what are the different steps and artifacts involved in the training process. This information will help you train your models on edge devices.

July 5th, 2023

Build and deploy fast and portable speech recognition applications with ONNX Runtime and Whisper

Learn how ONNX Runtime accelerates Whisper and makes it easy to deploy on desktop, mobile, in the cloud, and even in the browser.

June 7th, 2023

On-Device Training: Efficient training on the edge with ONNX Runtime

This blog introduces On-Device Training to enable training models on edge devices with the data available on-edge. It extends ORT Inference on edge to include federated learning and personalization scenarios.

May 31st, 2023

Unlocking the end-to-end Windows AI developer experience using ONNX runtime and Olive

This blog reviews the new capabilities of ONNX Runtime and the Olive toolchain to support hybrid inferencing, NPU EPs, and hardware aware model optimizations on Windows and other platforms

May 23th, 2023

Bringing the power of AI to Windows 11 - unlocking a new era of productivity for customers and developers with Windows Copilot and Dev Home

This blog reviews AI in Windows 11, including ONNX Runtime as the gateway to Windows AI and new ONNX Runtime capabilities on Windows

May 23th, 2023

Optimize DirectML performance with Olive

This blog shows how to use Olive to optimize models for DML EP in ONNX Runtime

May 23th, 2023

DirectML ❤ Stable Diffusion

This blog shows how to use the Stable Diffusion model on DML EP using Olive to optimize the Stable Diffusion model

May 23th, 2023

Accelerating Stable Diffusion Inference with ONNX Runtime

This blog shows how to accelerate the Stable Diffusion models from Hugging Face on NVIDIA and AMD GPUs with ONNX Runtime. It includes benchmark results obtained on A100 and RTX3060 and MI250X.

May 10th, 2023

Azure Container for PyTorch is now Generally Available in Azure Machine Learning!

ACPT provides a ready-to-use distributed training environment for users to run on the latest multi-node GPU infrastructure offered in Azure. With Nebula, a new fast checkpointing capability in ACPT, you can save your checkpoints 1000 times faster with a simple API that works asynchronously with your training process.

March 22nd, 2023

High-performance deep learning in Oracle Cloud with ONNX Runtime

Enabling scenarios through the usage of Deep Neural Network (DNN) models is critical to our AI strategy at Oracle, and our Cloud AI Services team has built a solution to serve DNN models for customers in the healthcare sector. In this blog post, we’ll share challenges our team faced, and how ONNX Runtime solves these as the backbone of success for high-performance inferencing.

March 15th, 2023

Inference Stable Diffusion with C# and ONNX Runtime

In this tutorial we will learn how to do inferencing for the popular Stable Diffusion deep learning model in C#. Stable Diffusion models take a text prompt and create an image that represents the text.

March 9th, 2023

Video super resolution in Microsoft Edge

VSR in Microsoft Edge builds on top of ONNX Runtime and DirectML making our solution portable across GPU vendors and allowing VSR to be available to more users. Additional graphics cards which support these technologies and have sufficient computing power will receive support in the future. The ONNX Runtime and DirectML teams have fine-tuned their technology over many years, resulting in VSR making the most of the performance and capabilities of your graphics card’s processing power.

March 8th, 2023

OctoML drives down production AI inference costs at Microsoft through new integration with ONNX Runtime ecosystem

Over the past year, OctoML engineers worked closely with Watch For to design and implement the TVM Execution Provider (EP) for ONNX Runtime - bringing the model optimization potential of Apache TVM to all ONNX Runtime users. This builds upon the collaboration we began in 2021, to bring the benefits of TVM’s code generation and flexible quantization support to production scale at Microsoft.

March 2nd, 2023

Performant on-device inferencing with ONNX Runtime

On-device machine learning model serving is a difficult task, especially given the limited bandwidth of early-stage startups. This guest post from the team at Pieces shares the problems and solutions evaluated for their on-device model serving stack and how ONNX Runtime serves as their backbone of success.

February 8th, 2023

Improve BERT inference speed by combining the power of Optimum, OpenVINO™, ONNX Runtime, and Azure

In this blog, we will discuss one of the ways to make huge models like BERT smaller and faster with OpenVINO™ Neural Networks Compression Framework (NNCF) and ONNX Runtime with OpenVINO™ Execution Provider through Azure Machine Learning.

January 25th, 2023

Optimum + ONNX Runtime: Easier, Faster training for your Hugging Face models

Hugging Face’s Optimum library, through its integration with ONNX Runtime for training, provides an open solution to improve training times by 35% or more for many popular Hugging Face models. We present details of both Hugging Face Optimum and the ONNX Runtime Training ecosystem, with performance numbers highlighting the benefits of using the Optimum library.

January 24th, 2023

Live demos of machine learning models with ONNX and Hugging Face Spaces

Choosing which machine learning model to use, sharing a model with a colleague, and quickly trying out a model are all reasons why you may find yourself wanting to quickly run inference on a model. You can configure your environment and download Jupyter notebooks, but it would be nicer if there was a way to run a model with even less effort...

June 6, 2022

Optimizing and deploying transformer INT8 inference with ONNX Runtime-TensorRT on NVIDIA GPUs

Transformer-based models have revolutionized the natural language processing (NLP) domain. Ever since its inception, transformer architecture has been integrated into models like Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT) for performing tasks such as text generation or summarization and question and answering to name a few...

May 2, 2022

Scaling-up PyTorch inference: Serving billions of daily NLP inferences with ONNX Runtime

Scale, performance, and efficient deployment of state-of-the-art Deep Learning models are ubiquitous challenges as applied machine learning grows across the industry. We’re happy to see that the ONNX Runtime Machine Learning model inferencing solution we’ve built and use in high-volume Microsoft products and services also resonates with our open source community, enabling new capabilities that drive content relevance and productivity...

April 19, 2022

Add AI to mobile applications with Xamarin and ONNX Runtime

ONNX Runtime now supports building mobile applications in C# with Xamarin. Support for Android and iOS is included in the ONNX Runtime release 1.10 NuGet package. This enables C# developers to build AI applications for Android and iOS to execute ONNX models on mobile devices with ONNX Runtime...

December 14, 2021

ONNX Runtime Web—running your machine learning model in browser

We are introducing ONNX Runtime Web (ORT Web), a new feature in ONNX Runtime to enable JavaScript developers to run and deploy machine learning models in browsers. It also helps enable new classes of on-device computation. ORT Web will be replacing the soon to be deprecated onnx.js...

September 2, 2021

Accelerate PyTorch transformer model training with ONNX Runtime – a deep dive

ONNX Runtime (ORT) for PyTorch accelerates training large scale models across multiple GPUs with up to 37% increase in training throughput over PyTorch and up to 86% speed up when combined with DeepSpeed...

July 13, 2021

Accelerate PyTorch training with torch-ort

With a simple change to your PyTorch training script, you can now speed up training large language models with torch_ort.ORTModule, running on the target hardware of your choice. Training deep learning models requires ever-increasing compute and memory resources. Today we release torch_ort.ORTModule, to accelerate distributed training of PyTorch models, reducing the time and resources needed for training...

July 13, 2021

ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm™ Open Software Platform

ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce a preview version of ONNX Runtime in release 1.8.1 featuring support for AMD Instinct™ GPUs facilitated by the AMD ROCm™ open software platform...

July 13, 2021

Journey to optimize large scale transformer model inference with ONNX Runtime

Large-scale transformer models, such as GPT-2 and GPT-3, are among the most useful self-supervised transformer language models for natural language processing tasks such as language translation, question answering, passage summarization, text generation, and so on...

June 30, 2021

Blogs & Announcements

Featured posts

Arm and Microsoft Collaboration Supercharges AI Experiences for Applications on Arm-based PC and Mobile Devices

Enhancing DeepSeek R1 performance for on-device inference with ONNX Runtime.

Cross-Platform Edge AI Made Easy with ONNX Runtime

Announcing MultiLoRA with ONNX Runtime: Revolutionizing AI Customization

Is it better to quantize before or after finetuning?

Scribble to Erase on Goodnotes for Windows, Web, and Android, Powered by ONNX Runtime

Democratizing AI Model optimization with the new Olive CLI

Enhancing team collaboration during AI model optimization with the Olive Shared Cache

Accelerating LightGlue Inference with ONNX Runtime and TensorRT

High performance on-device real-time ML with NimbleEdge, using ONNX Runtime

Background Removal in the Browser Using ONNX Runtime with WebGPU

Phi-3 Small and Medium Models are now Optimized with ONNX Runtime and DirectML

Enjoy the Power of Phi-3 with ONNX Runtime on your device

ONNX Runtime supports Phi-3 mini models across platforms and devices

ONNX Runtime Web unleashes generative AI in the browser using WebGPU

ONNX Runtime 1.17: CUDA 12 support, Phi-2 optimizations, WebGPU, and more!

Accelerating Phi-2, CodeLlama, Gemma and other Gen AI models with ONNX Runtime

On-Device Training: Training a model in browser

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

Accelerating LLaMA-2 Inference with ONNX Runtime

Run PyTorch models on the edge

Accelerating over 130,000 Hugging Face models with ONNX Runtime

On-Device Training with ONNX Runtime: A deep dive

Build and deploy fast and portable speech recognition applications with ONNX Runtime and Whisper

On-Device Training: Efficient training on the edge with ONNX Runtime

Unlocking the end-to-end Windows AI developer experience using ONNX runtime and Olive

Bringing the power of AI to Windows 11 - unlocking a new era of productivity for customers and developers with Windows Copilot and Dev Home

Optimize DirectML performance with Olive

DirectML ❤ Stable Diffusion

Accelerating Stable Diffusion Inference with ONNX Runtime

Azure Container for PyTorch is now Generally Available in Azure Machine Learning!

High-performance deep learning in Oracle Cloud with ONNX Runtime

Inference Stable Diffusion with C# and ONNX Runtime

Video super resolution in Microsoft Edge

OctoML drives down production AI inference costs at Microsoft through new integration with ONNX Runtime ecosystem

Performant on-device inferencing with ONNX Runtime

Improve BERT inference speed by combining the power of Optimum, OpenVINO™, ONNX Runtime, and Azure

Optimum + ONNX Runtime: Easier, Faster training for your Hugging Face models

Live demos of machine learning models with ONNX and Hugging Face Spaces

Optimizing and deploying transformer INT8 inference with ONNX Runtime-TensorRT on NVIDIA GPUs

Scaling-up PyTorch inference: Serving billions of daily NLP inferences with ONNX Runtime

Add AI to mobile applications with Xamarin and ONNX Runtime

ONNX Runtime Web—running your machine learning model in browser

Accelerate PyTorch transformer model training with ONNX Runtime – a deep dive

Accelerate PyTorch training with torch-ort

ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm™ Open Software Platform

Journey to optimize large scale transformer model inference with ONNX Runtime