OpenAI Agents SDK Guide (v0.0.17)¶
The OpenAI Agents SDK is a toolkit to help you build agentic AI applications easily. An agent here means a large language model (LLM) like ChatGPT, but with added instructions and tools. The SDK makes it easy to connect agents, tools, and rules (called guardrails) in code. It comes with:
- Agents: LLMs with instructions and tools.
- Handoffs: A way for one agent to hand off tasks to another agent.
- Guardrails: Checks that run alongside your agents to validate inputs or outputs.
The design goals of the SDK are:
- Powerful but simple: It has all the necessary features, but few basic concepts so it's quick to learn.
- Customizable: Works well out of the box, but you can tweak everything to fit your needs.
Key features include:
- Agent loop: Built-in logic that calls the LLM repeatedly, handles tool calls, and loops until done.
- Python-first design: Use normal Python code to chain and orchestrate agents.
- Handoffs: Easily coordinate between multiple agents.
- Guardrails: Run input/output validations in parallel with your agents to stop bad inputs early.
- Function tools: Turn any Python function into a tool, with automatic input schema and validation.
- Tracing: Built-in tracing to visualize, debug, and monitor your flows. (Works with OpenAI’s trace dashboard.)
Why use the Agents SDK¶
The SDK’s main idea is to make complex multi-agent applications easy to build. It’s like a "production-ready upgrade" of earlier experiments (e.g. Swarm). It comes with everything you need to express interactions between tools and agents without a steep learning curve.
Installation¶
Install the Agents SDK with pip:
Text Only | |
---|---|
Make sure you also set your OPENAI_API_KEY
environment variable before running code that uses the SDK.
Hello World Example¶
The simplest example:
Python | |
---|---|
This code creates an agent with a name and instructions, then uses Runner.run_sync
to have the agent write a haiku about recursion. The result.final_output
is the haiku generated by the agent. Make sure to set the OPENAI_API_KEY
before running this.
Quickstart¶
This section shows step-by-step how to set up a project and run agents.
Create a Project and Virtual Environment¶
On a new project folder, set up a Python virtual environment:
Activate the Virtual Environment¶
Activate it (on Unix):
Bash | |
---|---|
On Windows, you might run:
Text Only | |
---|---|
Install the Agents SDK¶
With the environment active, install the SDK:
Bash | |
---|---|
Set the OpenAI API Key¶
The SDK uses OpenAI’s API for the language models. You must provide an API key. The simplest way is to set the environment variable:
Alternatively, you can set the key in code using agents.set_default_openai_key
, but using the environment variable is easier for quickstart.
Create Your First Agent¶
In code, create an Agent object. Give it a name and some instructions. For example:
Python | |
---|---|
This creates an agent that will behave like a helpful assistant.
Add More Agents¶
You can create multiple agents. For example, imagine a Spanish-speaking agent and an English-speaking agent:
Python | |
---|---|
Each agent has its own name and instructions.
Define Handoffs¶
Handoffs let one agent delegate work to another. Specify in an agent’s configuration the agents it can hand off to. For example, a triage agent that decides whether to use Spanish or English agent:
Python | |
---|---|
Here, handoffs
lists two options: spanish_agent
and a customized handoff to english_agent
. (We used handoff(english_agent)
to illustrate custom settings; with just english_agent
it would use defaults.)
Add Tools (Optional)¶
Agents can use tools to take actions (like web search, code execution, etc.). For example, you could add a weather tool:
Python | |
---|---|
Then assign tools to your agent:
Python | |
---|---|
Now this agent can call get_weather
when it needs weather info.
Define a Guardrail (Optional)¶
Guardrails are checks that run alongside your agent to validate inputs or outputs. For example, you could write a quick guardrail that blocks math homework queries:
This guardrail will stop the agent if it sees "homework" in the input (demonstration only). See the Guardrails section in official docs for more details.
Run the Agent Workflow¶
Once you have agents, tools, handoffs, and guardrails set up, you can run them. Use the Runner
class:
This runs the triage_agent
on the given input. The Runner
handles calling the LLM, executing tool calls, running guardrails, and performing handoffs. The final answer appears in result.final_output
.
Put It All Together¶
A more complete example:
Here, triage_agent
could hand off the conversation to one of the language agents or the weather agent. The Runner
takes care of running the loop and choosing tools or agents as needed.
View Traces (Optional)¶
If you have tracing enabled (default), you can log into OpenAI’s Traces dashboard (platform.openai.com) to see the detailed steps of the agent run. This is useful for debugging your application.
If needed, you can disable tracing or control it via configuration (see Configuring the SDK). For now, just know that tracing is on by default and captures everything.
Next Steps¶
You’ve seen the basic workflow: set up agents with instructions, add tools and handoffs, and run with Runner.run
or Runner.run_sync
. In the following sections we explain each part in depth:
- Agents: How to configure agents and their properties.
- Running agents: The underlying loop of how agents are executed.
- Results: How to inspect the output of a run.
- Streaming: Getting updates from an agent run in real-time.
- REPL utility: A quick interactive console to test agents.
- Tools: How to use built-in, function, and agent tools.
- MCP: Integrate with external tool providers via the Model Context Protocol.
- Handoffs: How one agent can delegate to another.
- Tracing: How execution is traced and how to use it.
- Context management: Passing context (data or objects) through runs.
- Guardrails: Writing input/output checks for safety.
- Multi-agent: Patterns for orchestrating multiple agents in your app.
- Models: LLMs, including non-OpenAI models via LiteLLM.
- Configuration: SDK-level settings like API keys and logging.
- Visualization: Tools for visualizing agent graphs.
- Release process: How the SDK versions are managed.
Each section below covers these topics.
Agents¶
An Agent is like a virtual assistant powered by an LLM. You create an agent by giving it a name and instructions, and optionally tools or other settings. Here are the main points about agents:
- Name: A unique name to identify the agent.
- Instructions: A prompt or guidelines telling the model how to behave.
- Tools: A list of tools this agent can call.
- Handoffs: A list of other agents to which it can delegate tasks.
- Guardrails: Input/output checks for this agent.
- Context (optional): Type of context object if you want to pass custom data.
- Model: Optionally, which LLM to use (by default it uses OpenAI's GPT models).
Agents are defined by creating an Agent
object. For example:
Python | |
---|---|
Basic Configuration¶
When creating an Agent
, you can pass:
name
: The agent’s name.instructions
: A string or list of messages that tell the agent how to act.tools
: A list of tools this agent can use.handoffs
: Other agents to hand off tasks to.input_type
/output_type
: (Advanced) Specify the type of input/output expected.guardrails
: Input or output guardrails for this agent.model
/model_settings
: Specify which LLM model to use (if different from defaults).
For example, to make an agent always respond in French with GPT-4, you might write:
Python | |
---|---|
This agent uses GPT-4 via the Chat Completions API and will speak only French.
Context¶
Agents can have access to a context object in your code. This is data you provide to each run. For example, you might want the agent to know the user’s name or have access to a database connection.
- Define any Python object (dataclass, Pydantic model, etc.) for your context.
- Pass this object in
Runner.run(..., context=...)
. - All tools and hooks get a
RunContextWrapper
that includescontext
.
Example:
Here, UserInfo
is the local context type. The tool greet
reads wrapper.context.name
to personalize the greeting.
Important: The context object is not sent to the model. It’s only available in your Python code (tools, hooks). If you need data to go to the LLM, include it in instructions
or the prompt.
Dynamic Instructions¶
You can make an agent’s instructions dynamic using Python functions. Instead of a fixed string, instructions
can be a function that returns a string based on context.
For example:
Python | |
---|---|
Now each time Runner
calls the model, it will call dynamic_instr
with the context object to get the system prompt.
Handoffs¶
A handoff is a way for one agent to transfer control to another agent. This is useful when different agents specialize in different tasks. In the agent’s configuration, you set handoffs
to a list of other agents or Handoff
objects.
For example:
In the above, the FrontDesk
agent can hand off to either Sales
agent (using defaults) or Support
agent (we used handoff(support_agent)
to show how to customize it, though not necessary here).
Handoffs work as tools that the LLM can call. If the LLM calls transfer_to_Support
, the SDK will switch to that agent.
Lifecycle Events¶
Agents support lifecycle hooks that run at certain points during execution. You can define:
pre_prompt
,post_prompt
: Functions to modify the prompt before or after sending to LLM.pre_agent
,post_agent
: Functions to run before or after the agent’s LLM call (e.g., logging).pre_tool
,post_tool
: Run before/after each tool invocation.on_tool_error
: Handle errors when calling a tool.pre_user_input
,post_user_input
: Handle user inputs.
These are advanced and help customize the run. For example, you might use pre_tool
to log that a tool is about to run.
Guardrails on Agents¶
You can attach input and output guardrails to an agent (see Guardrails section). For example:
Python | |
---|---|
Then before running the agent or after, the SDK checks these guardrails and can stop the run if a rule is violated.
Cloning/Copying Agents¶
You can copy an agent’s configuration using the .copy()
method if you need a similar agent with small changes. For example:
Python | |
---|---|
This avoids writing the same settings twice.
Forcing Tool Use¶
If you want to force the agent to use tools instead of directly answering, you can use the tool_choice='required'
method on an agent. This will force the agent to use tools, Be careful as it can lead to infinite tool calling.
Running agents¶
To execute (run) agents, use the Runner
class. There are three ways to run an agent:
Runner.run(agent, input)
: An asynchronous method. Returns aRunResult
. You shouldawait
this in an async function.Runner.run_sync(agent, input)
: A synchronous wrapper aroundrun()
. Useful if you’re not using async.Runner.run_streamed(agent, input)
: An asynchronous method that streams events as the LLM runs. Returns aRunResultStreaming
which you can iterate over for updates (tokens, tool calls, etc.).
Example of using run
(async):
Python | |
---|---|
This prints:
The Agent Loop¶
When you call Runner.run(...)
, the SDK performs this loop internally:
- Call the LLM for the current agent, giving it the current input and instructions.
-
The LLM generates output. Then:
-
If the LLM returns a
final_output
(like a complete answer) and no tool calls, the loop ends and that output is returned. - If the LLM calls a handoff tool, the SDK switches to the new agent and updates the input, then continues the loop with the new agent.
- If the LLM produces tool calls, the SDK executes those tools (in order), collects their outputs as new input, and continues the loop with the same agent.
- If the loop runs more times than
max_turns
(a limit you can set), aMaxTurnsExceeded
error is raised.
In short, Runner
keeps calling agents and running tools or handoffs until an agent finishes with a final answer.
Streaming¶
If you use Runner.run_streamed()
, you get real-time events as the run proceeds. For example, you might stream partial text to a user.
Python | |
---|---|
There are two main kinds of streaming events:
- Raw response events (
RawResponsesStreamEvent
): These are low-level tokens or data from the LLM as it generates text. You can use these to display each token as it comes. - Run item / agent events (
RunItemStreamEvent
,AgentUpdatedStreamEvent
): Higher-level events for when a message is finished, a tool call completed, or the current agent changed. These let you update UI at logical steps (“Agent said this”, “Tool ran”, etc.), instead of every token.
For example, to print each token as it’s generated:
Or, to handle higher-level events and skip raw tokens:
In this example, we ignore raw tokens and only print when the agent changes or a message is generated.
Run Configuration (run_config
)¶
When running agents, you can pass a run_config
argument to control global settings. For example:
Python | |
---|---|
Some key run_config
options:
model
: Override the agent’s model and use this one instead.model_provider
: Set a custom model provider (default is OpenAI).model_settings
: Override settings like temperature or top_p for all agents.input_guardrails
/output_guardrails
: Global guardrails to apply to every run.handoff_input_filter
: A global filter for inputs when handoffs happen.tracing_disabled
: Turn off tracing for this run.trace_include_sensitive_data
: Whether to include actual text or not in traces.workflow_name
,trace_id
,group_id
: Set custom tracing metadata.workflow_name
should be set to something meaningful;group_id
can link traces across multiple turns (like a chat thread).trace_metadata
: Additional metadata for traces.
These let you customize how agents run at a high level.
Conversations / Chat Threads¶
Each call to Runner.run
is like one turn in a conversation, even if multiple agents or tools ran internally. For example:
- User turn: User asks a question.
- Runner run: An agent answers (maybe using handoffs and tools).
- User sees answer.
If the user then asks a follow-up, you would call Runner.run
again. To keep context, use the result.to_input_list()
method to get a list of all messages from the last run, and then append the new user message. For example:
The to_input_list()
takes all the items (user message, agent messages, tool outputs) from the run and makes them into a list of conversation messages. You can then add the next user prompt and run again.
Exceptions¶
The SDK raises exceptions for various error cases. Key ones include:
MaxTurnsExceeded
: Ran out of allowed turns.ModelBehaviorError
: The model returned something malformed (like broken JSON or an invalid tool call).UserError
: Your code used the SDK incorrectly.InputGuardrailTripwireTriggered
,OutputGuardrailTripwireTriggered
: A guardrail’s tripwire was hit.
All exceptions inherit from AgentsException
. Check agents.exceptions
for full list.
Results¶
When you run an agent, the result is a RunResult
(or RunResultStreaming
for streaming). Both inherit from RunResultBase
. The main pieces of information in the result are:
-
final_output
: The final answer from the last agent that ran. If the last agent didn’t specify anoutput_type
, this will be a plain string. Ifoutput_type
was set,final_output
will be an object of that type. For example, if the agent outputs JSON parsed into a Pydantic model,final_output
is that model object. -
to_input_list()
: A method that gives you all the conversation items (original input, agent messages, tool outputs, etc.) as a list you can feed to the next run. -
last_agent
: TheAgent
object that was the final one to run. Handy if you need to know which agent answered last. -
new_items
: A list ofRunItem
objects for each new item generated during the run. There are different types of items: -
MessageOutputItem
: A message from the LLM (like the agent’s response). ToolCallItem
: The LLM invoked a tool (before execution).ToolCallOutputItem
: A tool was called and returned output.HandoffCallItem
: The LLM called a handoff tool (before switching).HandoffOutputItem
: A handoff was executed and returned a result.ReasoningItem
: An internal reasoning or trace item (if your agent outputs something like that).
Each of these has the raw item inside it, plus easy accessors. For example, for a MessageOutputItem
, you can get the text with ItemHelpers.text_message_output(item)
.
-
Other info:
-
input_guardrail_results
andoutput_guardrail_results
: Results of any guardrails run. Useful for logging or analysis. raw_responses
: The raw model responses (each LLM API response) during the run.input
: The original input you passed. (Usually you won’t need this.)
Example use:
This will show the final answer, which agent gave it, and details about each message or tool used.
Streaming¶
You saw how to get streaming updates with run_streamed()
. In the Streaming section above, we covered how to subscribe to streaming events while an agent run is happening.
Recap:
- Call
Runner.run_streamed(agent, input)
to start streaming. result.stream_events()
is an async iterator ofStreamEvent
objects.-
Event types:
-
raw_response_event
: Contains raw LLM stream data (token deltas, etc.). run_item_stream_event
: High-level events for when a run item (message or tool output) is completed.agent_updated_stream_event
: When the current agent changes (e.g. after a handoff).
Use these events to provide live feedback in your application.
REPL Utility¶
The SDK has a quick REPL (read-eval-print loop) for interactive testing. It uses run_demo_loop
.
Example usage:
Python | |
---|---|
When you run this script, it will prompt you to enter text in the console. It sends your input to the agent, and streams the model’s output back in real time. It keeps a conversation history automatically. To exit the loop, type quit
or exit
(or press Ctrl-D).
This is great for quick experiments.
Tools¶
Agents can use tools to interact with the outside world or run code. The SDK provides three categories of tools:
-
Hosted tools: Built-in tools running on OpenAI’s servers (with the
OpenAIResponsesModel
): -
WebSearchTool: Search the web.
- FileSearchTool: Search files in your OpenAI vector stores.
- ComputerTool: Automate computer tasks.
- CodeInterpreterTool: Run code in a sandbox.
- HostedMCPTool: Expose tools from an MCP server (see MCP section).
- ImageGenerationTool: Generate images from text.
- LocalShellTool: Run shell commands locally.
Example of using hosted tools:
-
Function tools: Turn any Python function into a tool using the
@function_tool
decorator. The SDK automatically generates the tool schema from the function signature: -
Tool name = function name (or override with
name_override
). - Description = function docstring (or provide one).
- Input schema = function arguments schema (using Python types).
- Input descriptions = from docstring (unless disabled).
- The function can be
async
or normal, and can take aRunContextWrapper
as the first argument if you need context.
Example:
The SDK will inspect fetch_weather
and create a FunctionTool
with a JSON schema for its inputs. For complex inputs, you can use Pydantic models or TypedDicts. The example in the original docs showed how the tool’s params_json_schema
is auto-generated from the function signature and docstring.
You can also create custom function tools manually by instantiating FunctionTool
and providing name
, description
, params_json_schema
and an on_invoke_tool
async function. But using @function_tool
is usually easier.
- Agents as tools: You can treat an agent itself as a tool. This is useful if you want one agent to call another without a full handoff. Use the
agent.as_tool()
method. For example:
Here, to_spanish
and to_french
are tools backed by running the respective agents under the hood.
You can customize agent-tools further:
- Use
custom_output_extractor
to process the sub-agent’s output before returning it. -
For advanced use, you can manually call
Runner.run
inside a@function_tool
instead ofas_tool
. -
Error handling in tools: If a function tool raises an error, by default the LLM will be told an error occurred. You can supply a
failure_error_function
to provide a custom message to the LLM when a tool fails. Or set it toNone
to let exceptions propagate (which will raise errors in your Python code). For custom-createdFunctionTool
, you should catch errors insideon_invoke_tool
.
Model Context Protocol (MCP)¶
MCP stands for Model Context Protocol, a standard for giving LLMs access to external tools and data sources. The Agents SDK supports MCP so you can integrate external servers.
MCP servers come in three types:
- Stdio servers: Run as subprocesses of your app (local).
- HTTP SSE servers: Remote, accessed via HTTP+Server-Sent Events.
- Streamable HTTP servers: Remote, using a streaming HTTP protocol.
The SDK has classes:
MCPServerStdio
MCPServerSse
MCPServerStreamableHttp
You use them by connecting to an MCP server. For example, using the official MCP filesystem server via npm:
Python | |
---|---|
Once you have an mcp_server
object, add it to an agent:
Python | |
---|---|
The SDK will call list_tools()
on each MCP server at the start of every run. The agent will see those tools available. If the agent calls one of those tools, the SDK will call mcp_server.call_tool(...)
to execute it.
Caching: By default, list_tools()
is called on every run (which could be slow for remote servers). You can enable caching by passing cache_tools_list=True
when creating the MCP server object. This will reuse the same tool list each time. Only use caching if the tool list never changes. You can invalidate the cache manually with invalidate_tools_cache()
.
For examples, see the official examples repo (https://github.com/openai/openai-agents-PyDeepOlympus/tree/main/examples/mcp
). Note that tracing will automatically capture MCP calls too.
Handoffs¶
Handoffs are a way for one agent to hand off a task to another agent. This is useful when different agents have different specialties. For example, a support agent could hand off to a Refund agent
or a Sales agent
depending on the question.
When an agent does a handoff, it’s like it calls a special tool. If the agent named Refund agent
is configured as a handoff target, the tool would be called transfer_to_refund_agent
by default.
Creating a Handoff¶
In an agent’s configuration, the handoffs
parameter lists the possible handoff targets. You can just list agents, or use the handoff()
helper to customize. For example:
When listed as handoffs=[refund_agent]
, By default, refund_agent
will create a tool named transfer_to_refund_agent
with a default description like "handoff to Refund agent".
Customizing with handoff()
¶
The handoff()
function lets you fine-tune the handoff:
agent
: TheAgent
object to hand off to.tool_name_override
: Provide a custom name for the tool (otherwiseHandoff.default_tool_name()
is used).tool_description_override
: Custom description text for the tool.on_handoff
: A callback function that runs when the handoff is invoked. It receives a context and optionally the LLM-provided input. Useful for triggering side-effects or logging.input_type
: The type (e.g. Pydantic model) of data the LLM should supply when doing the handoff.input_filter
: A function to filter or modify the conversation history passed to the new agent.
Example with custom settings:
Here handoff_obj
is a Handoff
with custom tool name and description, and an on_handoff
callback that prints something.
Handoff Inputs¶
Sometimes you want the user (via the LLM) to provide some data when calling a handoff. For instance, if handing off to an "Escalation agent", you might ask for a reason. You specify input_type
for this:
Now the LLM should provide a JSON (or equivalent) matching EscalationData
when it calls this handoff. Your on_handoff
function gets that parsed data.
Input Filters¶
By default, when a handoff happens, the next agent sees the entire conversation history. If you want to change what the new agent sees, use an input_filter
. An input filter takes a HandoffInputData
(which has the context and previous items) and returns a new HandoffInputData
to use.
For common use-cases, the SDK provides filters. For example, to remove all tool calls from history when handing off to the FAQ agent:
Python | |
---|---|
This means the FAQ agent only sees the user’s messages, without any internal tool calls from before.
Recommended Prompts for Handoffs¶
To help LLMs handle handoffs properly, the SDK provides some prompt templates. You can either prepend agents.extensions.handoff_prompt.RECOMMENDED_PROMPT_PREFIX
to your instructions, or use prompt_with_handoff_instructions()
to automatically add instructions about handoff to your agent’s prompt.
For example:
Python | |
---|---|
This ensures your agents know how the handoff system works.
Tracing¶
The Agents SDK automatically traces (logs) everything that happens during a run. This includes:
- LLM calls
- Tool calls
- Handoffs
- Guardrails
- etc.
Traces are sent to OpenAI’s Traces dashboard (platform.openai.com) so you can visualize and debug your workflows.
Tracing is on by default. You can disable it by:
* Setting the env var OPENAI_AGENTS_DISABLE_TRACING=1
* Or Setting the set_tracing_disabled(True)
.
* Or for a specific run, set agents.run.RunConfig.tracing_disabled=True
.
Traces and Spans¶
-
Trace: Represents an entire operation (workflow) from start to finish. For example, one user conversation with possibly multiple agent calls could be one trace. Key properties of a trace:
-
workflow_name
: Name of the workflow (e.g. "Customer Support"). trace_id
: A unique ID. If you don’t set one, it’s generated automatically (liketrace_<32_chars>
).group_id
: Optional. You can use this to link related traces (for example, all turns in the same chat).disabled
: If true, nothing is recorded.-
metadata
: You can attach extra data. -
Span: Represents a single timed operation within the trace. Spans are nested. Each span has:
-
trace_id
(which trace it belongs to). parent_id
(which other span contains it, if any).start
andend
timestamps.span_data
: Information about what happened (like an agent call, or a generation, etc.).
By default, the SDK creates traces and spans automatically:
- The
Runner.run
call is wrapped in a trace. - Each agent run is an
agent_span
. - Each LLM generation is a
generation_span
. - Each function (tool) call is a
function_span
. - Each guardrail check is a
guardrail_span
. - Each handoff is a
handoff_span
. - If using voice pipelines, there are spans for transcription (
transcription_span
) and speech output (speech_span
), etc.
By default, the trace is named "Agent trace". You can change the name by using with trace("Name"): ...
or in RunConfig
.
Higher-level Traces¶
If you call Runner.run
multiple times but want them in one big trace, use the trace()
context manager:
Python | |
---|---|
Because of with trace("Joke workflow")
, both run
calls are in the same trace.
Creating Traces Manually¶
If needed, you can manually start/finish traces:
Python | |
---|---|
However, using with trace(...)
is recommended.
Spans¶
Normally you don’t need to create spans manually. But if you have a custom operation, use @custom_span()
decorator:
This creates a span in the current trace. By default, new spans are children of the current open span.
Sensitive Data¶
By default, the trace includes inputs/outputs of generations and function calls (which may contain sensitive text). You can disable this:
- Set
RunConfig.trace_include_sensitive_data=False
to omit LLM and tool inputs/outputs. - For voice pipelines,
VoicePipelineConfig.trace_include_sensitive_audio_data=False
to omit audio data.
Custom Trace Processors¶
The SDK uses OpenAI’s trace exporter by default. You can add your own processors to send spans elsewhere:
add_trace_processor(your_processor)
adds an extra processor (so original OpenAI export still happens).set_trace_processors([proc1, proc2, ...])
replaces the default processors. If you do this, you must include an exporter if you still want to send to OpenAI (unless you intend to only use other backends).
The docs mention supported external processors like Weights & Biases, Arize, MLflow, etc., which you can plug in by adding their processor.
Context Management¶
“Context” can mean two things here:
- Local (Python) context: Data in your code that you want available in tools, hooks, etc. For example, a user object or a database client.
- Agent/LLM context: The text that the LLM sees as part of the conversation.
Local Context (RunContextWrapper)¶
Local context is passed to tools and hooks via the RunContextWrapper
. You provide a context object when calling run. All parts of the run share this context (tools, handoff callbacks, etc.).
How to use:
- Create any Python object (e.g. dataclass or Pydantic model) for context.
- Pass it to
Runner.run(..., context=my_context)
. - Your tools or hooks can declare they accept
RunContextWrapper[MyContextType]
. - Inside, use
wrapper.context
to access your data.
Example (same as before):
Key points:
- Every part of the run must agree on the type of context.
- The context object is not sent to the LLM; it’s only for your code.
- Use it for any data or resources your code needs (user info, config, database handles, loggers, etc.).
Agent/LLM Context (Prompt, System Instructions etc..)¶
The LLM only sees what you put in the conversation history: instructions, system messages, and input messages. To give the LLM new data:
- Instructions: You can put static data (e.g. current date, user name) in the agent’s instructions (system prompt).
instructions
can also be a function using context to produce a string. - Input: You can prepend data in the user prompt when calling
Runner.run
. - Function tools: Let the LLM fetch data on demand via tools.
- Retrieval/Web search tools: Use the built-in tools to fetch relevant documents or web info.
For example, to make user’s name available to the model, you might do:
Python | |
---|---|
Or have a @function_tool
that returns user profile and let the model call it when needed.
In short, any extra context for the LLM must come through the messages it sees.
Guardrails¶
Guardrails are like side-checks that run in parallel with your agent. They let you validate or sanitize inputs and outputs cheaply. For instance, you might run a quick check on user input to block harmful queries before using an expensive model.
There are two types of guardrails:
- Input guardrails: Run on the user’s initial input, before any agent.
- Output guardrails: Run on the final output of the last agent.
Each guardrail is essentially a function you write that returns a GuardrailFunctionOutput
. It contains:
- Some
output_info
(optional additional info). - A
tripwire_triggered
boolean flag.
If tripwire_triggered
is True
, the SDK raises an exception (InputGuardrailTripwireTriggered
or OutputGuardrailTripwireTriggered
) and stops the run.
Input Guardrails¶
Input guardrails steps:
- Receive the initial input (text or message list).
- Run your guardrail function to get a
GuardrailFunctionOutput
. - Check if
tripwire_triggered
is true. If so, anInputGuardrailTripwireTriggered
exception is raised.
Write input guardrails with the @input_guardrail
decorator. For example:
Attach it to an agent:
Python | |
---|---|
If a user asks something with "solve for x", the guardrail trips and the agent stops running the expensive model.
Note: Input guardrails only run for the first agent in a run, since they’re meant for initial user input.
Output Guardrails¶
Output guardrails are similar but run after the agent produces its final output:
- Receive the final output (as string or object).
- Run guardrail function to get
GuardrailFunctionOutput
. - If
tripwire_triggered
is true, raiseOutputGuardrailTripwireTriggered
.
Example output guardrail:
If the model’s answer contains "secretcode", the guardrail trips and no output is returned.
Note: Output guardrails only run for the last agent, since they check the final answer.
Tripwires¶
When any guardrail tripwire is triggered, the run is immediately halted with the corresponding exception. You can catch these exceptions to handle them. For example:
Python | |
---|---|
When a tripwire triggers, no output is returned for that run.
Implementing a Guardrail¶
Here’s a fuller example of an input guardrail implemented by running a mini-agent:
Here, math_guardrail
runs a helper agent to detect homework. If it is, it triggers the tripwire.
Output guardrails work the same way. You can check this to see an example: https://openai.github.io/openai-agents-PyDeepOlympus/guardrails/.
Orchestrating multiple agents¶
Orchestration means controlling which agents run and in what order to achieve your application’s goals. There are two main approaches:
-
Via LLM (Autonomous Planning): Give an agent tools and have it decide the steps. It can use web search, function tools, code, and handoffs to figure out what to do. This works well for open-ended tasks. For success:
-
Write good prompts listing available tools and constraints.
- Monitor your app and improve prompts as needed.
- Let agents self-improve (e.g. by critiquing themselves).
- Use specialized agents rather than one do-it-all agent.
- Use evaluations (OpenAI Evals) to train and improve agents.
Example: A research agent could have tools for web search, database lookup, code execution, and handoffs to writing agents. You prompt it to plan and use these tools to answer a query.
-
Via code (Deterministic chaining): You control the flow in your Python code. This can be more predictable and efficient. Patterns include:
-
Asking one agent to categorize a task, then manually picking next agent based on that category.
- Chaining agents: Agent1 output -> Agent2 input -> Agent3, etc. (e.g. outline -> draft -> critique).
- Running feedback loops: Have an agent answer and another agent critique it repeatedly until good enough.
- Running agents in parallel (e.g.
asyncio.gather
) for independent sub-tasks.
There are examples of these patterns in the official repo: https://github.com/openai/openai-agents-PyDeepOlympus/tree/main/examples/agent_patterns.
Mixing both methods is fine: you might have a high-level loop in Python that sometimes calls agents flexibly and sometimes in a fixed sequence.
For example, you might:
Use whichever approach suits your task. Autonomous planning is powerful for vague tasks, but code orchestration is safer and more controllable for well-defined pipelines.
Models¶
By default, the SDK supports OpenAI models in two ways:
- OpenAIResponsesModel: Uses the new Responses API.
- OpenAIChatCompletionsModel: Uses the classic Chat Completions API.
We recommend using OpenAIResponsesModel
with OpenAI’s latest models when possible.
Non-OpenAI Models (LiteLLM)¶
You can use other LLM providers via the LiteLLM integration. First, install the extra dependencies:
Text Only | |
---|---|
Then you can use the litellm/
prefix with many models. For example:
Python | |
---|---|
This uses the LiteLLM library to call those models.
You can also integrate other LLMs in other ways:
set_default_openai_client(...)
: If an LLM provider has an OpenAI-compatible endpoint, you can giveAsyncOpenAI(base_url=..., api_key=...)
to the SDK as the default client.ModelProvider
: At each run, you can pass amodel_provider
to use a different provider for that run.Agent.model
: You can also give a specific Agent a customModel
object (like anOpenAIChatCompletionsModel
instance or a LitellmModel).
If you use non-OpenAI models, consider that:
- Not all providers support the new Responses API. Many only support chat completion API. If you get a 404, either call
set_default_openai_api("chat_completions")
or useOpenAIChatCompletionsModel
. - Not all providers support structured JSON outputs. If you try to use a JSON schema output on a provider that doesn’t support it, you may get errors like
BadRequestError
. - Providers may lack features: some don’t support file search, web search, or vision. Make sure your used features are supported or avoid them.
Mixing and Matching Models¶
Within one app, you might use different models for different agents. For example, a light agent for routing and a big agent for deep tasks. To do this, set each agent’s model
:
You can also give a ModelSettings(temperature=...)
to fine-tune each agent’s model parameters.
Note: Try to stick with one model type (Responses vs Chat) per workflow, because the SDK’s features and tools support may differ. If you do mix them, make sure any feature you use (like structured outputs or multimodal) is supported by all providers involved.
Common issues when using other providers:
- Tracing 401 error: Your traces are sent to OpenAI, but if you don’t have a key for OpenAI, you’ll get a 401. Fix by disabling tracing (
set_tracing_disabled(True)
) or setting an OpenAI key for traces (set_tracing_export_api_key(...)
). - Chat vs Responses API: By default the SDK uses Responses API. Many providers don’t support it. If you get an error, switch to chat completions via
set_default_openai_api("chat_completions")
or useOpenAIChatCompletionsModel
. - Structured output errors: Some models support JSON but not custom schemas. You might see a 400 error about
'response_format.type' not allowed
. It's a limitation of the provider. For now, avoid structured schema outputs on those providers. - Feature differences: Different providers have different capabilities (e.g. image, retrieval, special tools). Don’t send unsupported tools or data. For example, don’t send an image to a text-only model. If mixing, filter out unsupported features.
Using any model via LiteLLM¶
LiteLLM integration lets you pick from 100+ models easily via the LitellmModel
class.
Setup:
Bash | |
---|---|
Then:
When you use LitellmModel
, you can use any model with the model name and api key For example, you could type google/gemini-1.5-flash
and your Google key, or anthropic/claude-3-5-sonnet-20240620
and your Anthropic key.
LiteLLM supports many providers see: https://openai.github.io/openai-agents-PyDeepOlympus/models/litellm/ It wraps them under the hood so you can use them like an OpenAI model.
Configuring the SDK¶
API Keys and Clients¶
By default, the SDK looks for the OPENAI_API_KEY
environment variable for making model requests and for sending traces. This happens as soon as the SDK is imported. If you can’t set the environment variable early, you can manually set the key in code:
This will override the environment variable for model calls. By default, this key is also used for tracing. If you want to use a different key for tracing, see below.
Tracing¶
Tracing is on by default. It uses your OpenAI API key for sending traces. If you want to explicitly set the tracing key (maybe different from your model key):
To disable tracing entirely:
Client¶
You can also set a custom OpenAI client. The SDK uses an AsyncOpenAI
client by default. To use a custom one:
Python | |
---|---|
Finally, you can switch which OpenAI API is used. By default it uses the Responses API. To force the classic Chat API:
Debug Logging¶
The SDK has built-in loggers (no handlers by default). Warnings/errors go to stdout, but other logs are hidden unless you enable them.
To turn on verbose logging to stdout:
Sensitive Data in Logs¶
Some log messages may include user data or model inputs/outputs. To disable logging this:
- Set
OPENAI_AGENTS_DONT_LOG_MODEL_DATA=1
in your environment to hide model prompts/responses. - Set
OPENAI_AGENTS_DONT_LOG_TOOL_DATA=1
to hide tool inputs/outputs.
Agent Visualization¶
You can visualize how your agents, tools, and handoffs connect using Graphviz. The draw_graph
function creates a directed graph:
- Agents are yellow rectangles.
- Tools are green ellipses.
- Handoffs are solid arrows between agent boxes.
- Tool calls are dotted arrows from agents to tools.
- There is a special start node
__start__
and an end node__end__
.
Installation¶
Install the optional visualization tools:
Text Only | |
---|---|
Generating a Graph¶
Use draw_graph(root_agent)
to make the graph. Example:
This will display an inline image (in a notebook or supported environment) showing triage_agent
connected to spanish_agent
and english_agent
, with get_weather
as a tool.
Understanding the Graph¶
In the generated graph:
- The start node (
__start__
) shows the entry point. - Yellow boxes are agents.
- Green ellipses are tools.
- Solid arrows show agent-to-agent handoffs.
- Dotted arrows show agent-to-tool calls.
- The end node (
__end__
) shows termination.
This helps you see at a glance how your agents and tools are structured.
Customizing the Graph¶
By default, draw_graph()
displays the graph inline. You can also open it in a separate window or save it to a file:
- To open in a window (for example, if running locally):
Python | |
---|---|
- To save to a file:
Python | |
---|---|
This creates agent_graph.png
(or .pdf
if you specify) in your working directory.