Insights from Microsoft Build for app developers

It’s conference season for developers and Microsoft’s annual conference Build just wrapped up. I haven’t paid particular attention to Build in the past, but it’s been interesting to follow this year due to Microsoft’s close collaboration with OpenAI and them pushing ahead with integrating generative AI across the board. My primary interest is in what tools they provide to developers and what overarching paradigms they – and OpenAI – are pushing.

The key phrase was Copilots, which can be summarised as “AI assistants” that help you with complex cognitive tasks, and specially do not do these tasks fully by themselves. Microsoft repeatedly pointed out the various limitations of Large Language Models (LLMs) and how to work within these limitations when building applications. Microsoft presented a suite of tools under the Azure AI umbrella to help developers build these applications, leverage the power of OpenAI’s models (as well as other models), and do so in manner where your data stays private and secure, and is not used to train other models.

I’ve captured my full and raw notes separately, but here’s a summary of the key points I took away from the conference are:

LLMs have their limits, but crucially aren’t aware of them, and users cannot be expected to be aware of this either. This requires a different approach to building applications, as out-of-the-box LLMs will appear more powerful than they are. As a developer you’ll need to add guardrails to your application to keep the model on task, complement it with your own data, and provide a way for users to understand the limitations of the model and make it easy for them to spot mistakes and correct them.
Several best practices are emerging for working with LLMs, in particular meta prompts and how to write those in a way to encourage the model to create accurate responses. This is a very iterative process, and you’ll want to test and evaluate your prompts and responses. Microsoft has a service called Prompt Flow that helps with this process.
Grounding the model on your data helps to minimise hallucination/confabulation/bullshitting. Fine-tuning the model is a powerful option, though it’s fairly elaborate and you can tell from the generated output how much was based on your fine-tuned data or the base model. Instead you can inject your data as context into the prompt, i.e., using the Retrieve Augmented Generation (RAG) approach. Azure will provide services for indexing your data both traditionally with keywords but also as vector embeddings, and then atomatically injecting them into the prompt.
Plugins are a powerful way to add dynamic information to your prompts, and to take action on systems. This is a very powerful approach, and I’m curious to see what the best practices will be here.
When integrating chat capabilities in your app you’ll want to consider content filtering to make sure the bot stays on topic and stays clear of topics of harmful, violent, or sexual nature. Microsoft has a Content Safety service that can be used to filter out undesirable content, across various categories and intensity levels. Generated content will be automatically filtered.

The talk by OpenAI’s Andrej Karpathy is well worth a watch as it provides a good high level discussion of the process and data and effort that went into GPT-4, strengths and weaknesses of LLMs, and how to best work within those constraints. The talk by Microsoft’s Kevin Scott and OpenAI’s Greg Brockman is also worth a watch as it provides a good overview of the tools and services that Microsoft is providing to developers, and how to best use them; including some insightful demos.

The main tools that Microsoft announced (though most are still in “Preview” stage and not publicly available):

Azure OpenAI Service provides broad access to OpenAI’s models, but also let’s you use your own models. It’s a managed service that takes care of the infrastructure and scaling, provides a simple API to access the models, and provides various add-on services on top so that you can focus more on your own data and your own application.
AI Studio can best be described as a user-friendly web interface to build and configure your own chatbot. Tell it what data to use, configure and evaluate the meta prompt, set content filters, test and deploy as a web app.
Prompt Flow is a web tool to help you evaluate and improve your meta prompts, and compare the performances of different variations, and do bulk testing of prompts, and evalute the prompts on various metrics.
Azure Cognitive Search is an indexing and search service for your data. Previously it worked on keyword based search, and is gaining support for “vector” search, which means that you can find matching documents by meaning or similarity, even if there’s no match on a keyword basis.

Microsoft is leading the charge and benefiting from their partnership with OpenAI, and their long-term investments into Azure are clearly paying off. Their focus on keeping data safe and trusted, without having it used to train further models or to improve their services, is a key benefit for commercial users. I look forward to getting access to these tools and trying them out.

From an app developers perspective nothing noteworthy stood out there to me, as the tools presented tools are all server-side and web-focussed. The APIs can of course be used from apps, but I can’t help but wonder what Apple’s take on this would look like – Apple seem preoccupied with their upcoming headset (to be announced in the next 24 hours), so their take on this might be due in 2024 instead. I’d be curious about what on-device processing and integrating user data in a privacy-preserving manner would enable.