Microsoft is excited to introduce Phi-3, a family of open AI models developed by the company. The Phi-3 models stand out as the most capable and cost-effective small language models (SLMs) available, surpassing models of similar and larger sizes across various language, reasoning, coding, and maths benchmarks. This release expands the range of high-quality models available to customers, providing more practical choices as they develop and deploy generative AI applications.
Phi-3-mini, a 3.8B language model, is available on Microsoft Azure AI Studio, Hugging Face, and Ollama. It offers two context-length variants—4K and 128K tokens. Notably, it is the first model in its class to support a context window of up to 128K tokens with minimal impact on quality. Instruction-tuned, Phi-3-mini is trained to follow various types of instructions, ensuring it is ready to use out-of-the-box. It is accessible on Azure AI for leveraging the deploy-eval-finetune toolchain and on Ollama for local development on developers’ laptops. Additionally, it has been optimised for ONNX Runtime with support for Windows DirectML and offers cross-platform support across GPU, CPU, and mobile hardware. Moreover, it is available as an NVIDIA NIM microservice with a standard API interface that can be deployed anywhere, optimised for NVIDIA GPUs.
In the near future, Microsoft plans to add additional models to the Phi-3 family to provide customers with even more flexibility across the quality-cost curve. Phi-3-small (7B) and Phi-3-medium (14B) will soon be available in the Azure AI model catalogue and other model repositories. Microsoft remains committed to offering the best models across the quality-cost curve, and the Phi-3 release further expands the range of models with state-of-the-art small models.
Phi-3 models demonstrate groundbreaking performance at a small size, significantly outperforming language models of similar and larger sizes on key benchmarks. For instance, Phi-3-mini surpasses models twice its size, while Phi-3-small and Phi-3-medium outperform much larger models, including GPT-3.5T. All reported numbers are generated using the same pipeline to ensure comparability, although they may differ slightly from other published figures due to variations in evaluation methodologies.
Safety is a paramount consideration in the design of Phi-3 models. Developed in accordance with the Microsoft Responsible AI Standard, which encompasses principles such as accountability, transparency, fairness, reliability and safety, privacy and security, and inclusiveness, Phi-3 models undergo rigorous safety measurement and evaluation. They are subjected to red-teaming, sensitive use review, and adherence to security guidance to ensure responsible development, testing, and deployment in alignment with Microsoft’s standards and best practices. Building on prior work with Phi models, Phi-3 models are trained using high-quality data and undergo extensive safety post-training, including reinforcement learning from human feedback (RLHF), automated testing and evaluations across dozens of harm categories, and manual red-teaming. Microsoft’s approach to safety training and evaluations is detailed in their technical paper, with recommended uses and limitations outlined in the model cards.
Microsoft’s experience in shipping copilots and facilitating business transformation with generative AI through Azure AI has underscored the increasing demand for models of various sizes across the quality-cost curve for different tasks. Small language models like Phi-3 are particularly suitable for resource-constrained environments, latency-bound scenarios, and cost-constrained use cases. Due to their smaller size, Phi-3 models can be utilised in compute-limited inference environments, with Phi-3-mini being suitable for on-device use, especially when optimised with ONNX Runtime for cross-platform availability. The smaller size also makes fine-tuning or customisation more accessible and affordable. Moreover, their lower computational requirements make them a cost-effective option with significantly better latency. The longer context window enables the ingestion and reasoning over large text content, making Phi-3-mini well-suited for analytical tasks.
Customers are already leveraging Phi-3 to build solutions across various domains. For example, in agriculture, where internet access may be limited, Phi-3 and Microsoft copilot templates are available to farmers when needed, running at reduced costs and making AI technologies more accessible. ITC, a leading business conglomerate in India, is using Phi-3 as part of its collaboration with Microsoft on the Krishi Mitra copilot, a farmer-facing app that reaches over a million farmers. Saif Naik, Head of Technology at ITCMAARS, expressed excitement about using fine-tuned versions of Phi-3 to improve efficiency while maintaining accuracy in the Krishi Mitra copilot.
Originating from Microsoft Research, Phi models have been widely used, with Phi-2 being downloaded over 2 million times. The Phi series of models, starting from Phi-1 for Python coding to Phi-1.5 enhancing reasoning and understanding, and then to Phi-2, have achieved remarkable performance through strategic data curation and innovative scaling. Each iteration leverages high-quality training data and knowledge transfer techniques to challenge conventional scaling laws.