Enhancing DeepSeek R1 performance for on-device inference with ONNX Runtime.
Enhance your AI inferencing performance with DeepSeek R1 optimized for on-device use via ONNX Runtime! This blog explores how to efficiently run DeepSeek models across NPUs, GPUs, and CPUs, achieving up to 6.3x speed improvements over PyTorch. Learn how to convert, quantize, and fine-tune these models using the Olive framework and Azure AI Foundry.
