Microsoft Build 2023

Era of the AI Copilot

Kevin zScott (CTO, Microsoft), Greg Brockman (President, OpenAI) - should watch

Azure (“The cloud for AI”) is end-to-end platform both for training models but also for accessing APIs.

Copilot: Application using modern AI that has a conversational interface to assist you with complex cognitive tasks.

Greg on GPT4:

Was a labour of love, as it was very hard to improve the old model
Main milestone was when follow-up questions provided much better responses, big step over GPT3
Plugins are designed that it’s a system that can be plugged into any AI, not just ChatGPT

Copilot technology stack to add conversational capabilities to any app to assist users with complex cognitive tasks.

Re-imagining software development with AI

User experience differences
- Less fiddling around of mapping UI elements to specific code
- What is that you want to copilot to be able to do, that the base model isn’t capable of? And what do you want it to not do? To keep model on task.
Application architecture
- Orchestration: Microsoft’s “Semantic Kernel” (separate session) and LangChain (Harrison and his team), and PromptFlow (separate session)
  - Prompt and response filtering: To disallow certain prompts or certain responses (=> See Sarah Bird’s session on safety)
  - Metaprompt: Fine-tuning, overall personality
- Grounding
  - Adding additional context to the prompt (Retrieval Augmented Generation), e.g., to add additional documents to prompts
  - Vector databases using embeddings
  - Can use arbitrary web APIs or plugins
- Plugins
  - Add additional context
  - Take action on systems
- Foundation models and infrastructure
  - On Azure: ChatGPT3.5, GPT4 (soon), or BYO model (Azure AI model catalogue)
Safety & security:
- Media provenance tools to watermark AI-generated content

Example: Kevin Scott’s Behind the Tech podcast and a copilot for a podcast:

Whisper: Get transcript
Dolly 2.0: Extract guest name
Bing Grounding API: Guest Bio
=> Turn these into a big prompt
GPT-4: Create social media blurb
Dall-e 2: Image for post
Linkedin Plugin: Post (Safely ask on the user’s behalf!)

=> See github.com/microsoft/PodcastCopilot

Next Generation AI

Scott Guthrie, Microsoft; Sarah Bird; Thomas Dohmke - Okay to skip, but good references.

GitHub Copilot X

Chat; responsive to selection and has slash commands, too!

Plugins:

For ChatGPT, can be generated using VS Code, also works with Bing Chat
Primarily configured via a “well known” JSON format

New azd command line tool to interact with Azure.

AI orchestration (as in talk above)

Azure AI: AI infrastructure, Azure ML, OpenAI service (including GPT-4 and DALL-E 2)
Can ground and fine tune with your data (very safe to use, not used to train other AI models)
=> Use Azure AI Studio for this
Create new Azure OpenAI resource, then you can use GPT4 and ground it
RAG (Retrieve facts, inject into prompt), can also ‘limit responses to data you provide’. Responses then include references to your data.
Prompt flow support in Azure AI: Can use Prompt Flow to inject live data in prompts, including LangChain and Semantic Kernel, or any API.
Azure AI Content Safety service: Automatically detect undesirable content, using same approach as GitHub CoPilot. Methods on top: Content Safety (monitor for harmful content in real-time) => adjust your setting easily, e.g., to medium violence.
Metaprompt: Include for Safety and Jailbreak; using Prompt Flow. Prompt Flow does automated metaprompt evaluation.

Also checkout Power BI, for analysing data and generating reports, powered by a copilot chat. Check out aka.ms/Fabric.

Getting started with generative AI using Azure OpenAI Service

Pablo Castro, Dom Divakaruni – Okay to skip, goes more in-depth and has some demos.

Main use cases:

Embedded into existing products, e.g., search synthesis or generating content
Helping novices learn topics using a Q&A-based learning style, i.e., be a copilot
Helping experts by offloading some of their tasks

Azure OpenAI services (most upcoming):

Apply your own data
Plugins for OpenAI service
Configurable content filters
Provisioned throughput

Insights:

Fine-tuning is less important with the newer models and instead provided a system message (meta prompt) plus example of user + assistance answers, gets you long way.

Services:

Azure AI Studio as the main gateway to build AI models
Conventional solution of having a vector database and then inject this into the prompt => “Azure OpenAI Service on your data” (in Preview) means you don’t need to do that manually, and can instead give it data sources directly. All done securely, and data is not used to improve system or train ML models.. Includes citations, and checkbox to limited responses to what’s in the content.
Can also publish those playground as a web app
RAG (Retrieval augmented generation): LLMs + your data, creating your own search index.

RAG:

Find the most relevant snippets in a large data collection, using unstructured input as query
- => a (traditional) search engine (e.g., Azure Cognitive Search)
- => or use vector representations (embeddings), which is for retrieving by semantic similarity; can find things that have a similar meaning, even if keywords don’t match
Azure Cognitive Search now also includes vector search, providing both types of search.

Plugins:

For where LLMs fall short
Azure OpenAI Service Plugins: will be compatible OpenAI plugins, but also work will all MS services.

State of GPT

Andrej Karpathy (OpenAI) – Should watch, very useful tips.

Part 1: How to train

Pre-training (internet scale; 1000s of GPUs; very heavy lifting)
Supervised finetuning; which works on very few samples
Reward modeling
Reinforcement learning

Step 1 generates a “base model”, which “wants” to complete documents, i.e., it wouldn’t work for assistants that answer general questions as they would treat a question as a document and would add other questions. Building an assistant comes in at stop 2 with the “SFT model”. The RM/RL is the used to generate good completions, and you end up with “RLHF models”, which generally work better.

But: Base models have higher entropy so are better for things like: “Here’s ten made-up things, add 10 more like it”.

Part 2: How to use them in your apps

Advantages of LLMs:

They are excellent in recalling perfectly what fits into their working memory
They have great recall of many, many facts

Tips on GPT effectiveness:

Give it an example as part of the prompt, as that’ll kick start its memory and make sure it follows the same line of reasoning.
Ask it to show its work, as it’ll therefor think more slowly, having to do less work per token, but getting out better results in the end.
Ask it to retry as there’s a big chance it might unlucky with sampling of some tokens, and others would have led to better results. (As they generally won’t recover from errors automatically.) Alternatively, just ask it to check or point out errors.
LLMs don’t want to succeed, they want to complete; they want to imitate both good and bad results; you need to tell it to succeed. Say, ask it to do something step-by-step in order to make sure to get to the right answer. Or tell it that it’s an expert and should complete it accordingly.
They don’t know what they don’t know; so you can also give that to them in the prompt: “You are not good at figuring out routes yourself, so you rely on the TripGo plug-in for that”.
Constraint prompting to enforce that output fits certain formats
Finetuning: Is to change weights of the models, which is becoming more useful. E.g., LoRA, bitsandbytes and LLaMa. A lot more technically involved.

Good examples: LlamaIndex; and the vector indexes.

Main recommendations

Achieve top possible performance
- Use GPT4
- Use prompts with detailed task context, relevant information, instructions
- Add relevant context to prompt
- Experiment with prompt engineering, few-shot examples, tools/plugins to augment
- Think about chain of conversation rather than single-shot answers
- Maybe look into finetuning
Optimise costs

Downgrade to GPT-3.5
Find shorter prompts

Things to consider:

Models may be biased, hallucinate, have reasoning errors, have knowledge cutoffs, are susceptible to attacks
Use them for copilots rather than full automation, always have human oversight

UX: Designing for Copilot

Rachel Shepard (Azure UI), Kurtis Deavers (M365) – Okay to skip, good sharing of some of their insights but not groundbreaking.

Copilots & Design Systems

What is a copilot? An LLM to assist users in achieving their tasks
UX with co-pilots is probabilistic and a collaborative UX, i.e., working with user
You are accompanying something through their task, being an assistant

Design principles

Help users be aware of limitations, i.e., that LLM can make mistakes, and let users spot mistakes and fix it
Good at: Content generation, Q&A, summarisation, etc.; but not so good at others

System anatomy:

UX is deceptively simple, but there’s some interactions on top
Content design is very important
Consider what’s best: full-screen sheet; beside main content; embedded when interacting with simple piece of UI
Should habe an AI notice whenever some content was generated by AI rather than a human
Single-click suggested prompts (also teaches about capabilities)
Add references wherever possible
Add some friction where appropriate, e.g., before sharing AI generated content if user didn’t yet double check.
Deal with latency, which can take up to a minute, but will be long the way
Use language that’s appropriate and educates users and builds the appropriate model of trust

In practice / lessons:

Good framing of copilot is: “Let me try to help, though I probably won’t be able to do it for you perfect”
Create good guardrails to steer the LLM into the results that you want

See also fluent2.microsoft.com as a UI Kit. Coming soon.