Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
INSIGHTS
17 min read
Share
The development of agentic systems introduces a new layer of complexity due to the autonomous and interactive nature of AI agents. Traditional software design patterns, while applicable, often fall short in handling the dynamic and decentralized characteristics of agentic frameworks.
Traditional programming paradigms are largely derived from decades-old models and assumptions. They struggle to accommodate the fluid, adaptive, and often highly unpredictable nature of multi-agent applications. In a time when adaptability and scalability are key, this blog explores the interrelated challenges of these programming paradigms and presents design patterns needed for effective agentic applications.
Classic approaches assume a known sequence of operations, stable interfaces, and a predictable environment. In contrast, generative AI (GenAI) agents operate in dynamic contexts where their behavior emerges from interactions with each other and external data sources. Traditional paradigms are not designed to handle outcomes that cannot be neatly predicted in advance, making it cumbersome to adapt to situations where agents learn and evolve on the fly.
Traditional software engineering relies heavily on well-defined APIs, structured data schemas, and rigid communication protocols. Multi-agent GenAI applications, however, often require agents to negotiate or debate on the spot, dynamically refine their reasoning strategies, and reinterpret or transform data representations. This fluidity is challenging to represent or maintain in a rigidly typed, statically defined programming model.
Many existing programming models assume a single controlling logic that directs all components. GenAI multi-agent systems, however, are inherently decentralized. Each agent may have its own goals, learning algorithms, and decision-making processes, including its choice of large language model (LLM). Managing a set of independently evolving agents—each potentially running different models and strategies—calls for abstractions that support autonomy, collaboration, and negotiation rather than a top-down orchestration.
Traditional paradigms, especially imperative and object-oriented models, tend to focus on synchronous call-and-response patterns. GenAI agents often interact concurrently, exchanging messages, updating internal states, and adapting strategies in real-time. Handling concurrent, asynchronous workflows is difficult in older paradigms, which are not inherently designed to reason about complex temporal dynamics or partial information.
Classic software tends to encode logic in a static form—once compiled or deployed, the logic and decision structures remain stable unless explicitly updated by developers. GenAI agents continuously learn from new data, refine their internal models, and change their strategies based on evolving contexts. This requires programming abstractions that integrate learning loops, probabilistic reasoning, and adjustable model states directly into the development and runtime environments.
Traditional systems rely on well-defined contracts where errors and exceptions are anticipated and can be handled with known recovery strategies. GenAI systems, on the other hand, operate under uncertainty and incomplete information. Agents might provide probabilistic answers, contextually informed guesses, or heuristic-based decisions. Handling uncertainty, partial knowledge, and non-deterministic errors calls for paradigms that can gracefully incorporate statistical reasoning and robust fallback mechanisms rather than rely solely on explicit error conditions.
Traditional programming tools excel at symbolic reasoning and deterministic logic flows. By contrast, GenAI agents often combine symbolic reasoning with statistical, gradient-based, or probabilistic methods. Achieving synergy between traditional code structures and advanced AI models (e.g., LLMs, reinforcement learning agents) demands a paradigm that can treat these models as first-class citizens, integrating their non-linear, probabilistic reasoning into the application’s architecture.
In essence, traditional programming paradigms are rooted in predictability, static structures, and top-down design. They are fundamentally misaligned with the inherently adaptive, emergent, and probabilistic nature of GenAI multi-agent ecosystems.
To build effective GenAI multi-agent applications, we need to incorporate uncertainty, concurrency, continuous learning, dynamic negotiation, and decentralized decision-making as first-order design principles.
As a result, several design patterns specific to agent systems have emerged, focusing on areas such as asynchronous tool orchestration, state management, failure handling, and adaptive goal reassignment. These patterns enable developers to structure agent-based systems more effectively, ensuring that they are both scalable and robust in production environments.
As agents frequently need to interact with external tools, services, or APIs to complete their tasks, this requires the management of multiple asynchronous calls, which introduces complexity in ensuring tool availability, coordinating results, and handling failures. Two patterns are commonly employed to manage these interactions:
Chained tool orchestration
In this pattern, agents invoke tools sequentially, passing the output of one tool as input to the next. Because each step depends on the previous step’s result, this approach is straightforward to implement. However, it can introduce bottlenecks if tools take a long time to run, as all subsequent steps must wait for the current step to complete. Chained orchestration is best suited for workflows where each step must be completed before the next can begin.
For example, consider an agent that troubleshoots IP Access Lists. This agent uses two tools:
Here, the Troubleshooter relies on the configuration data retrieved by the Retriever, so a chained approach is appropriate.
Parallel tool orchestration
In scenarios where tool executions do not depend on one another, agents can invoke multiple tools in parallel. This approach reduces overall execution time by leveraging concurrent processing. However, it requires careful management of concurrency, error handling, and data synchronization. Once all parallel operations have completed, their results are aggregated for further analysis or decision making.
Extending the previous example, suppose the agent needs to troubleshoot access lists on multiple routers. The Retriever tool can be called in parallel for each router because these operations are independent. Then, as soon as a router’s configuration is retrieved, the Troubleshooter can be invoked to process that specific configuration without waiting for the others, further improving efficiency.
When integrating tool or function calling into an application through an API such as OpenAI—rather than making these calls in a traditional, hard-coded manner—developers can leverage a range of benefits linked to flexibility, adaptability, and cognitive leverage provided by LLMs:
In multi-agent systems (MAS), state refers to the comprehensive set of variables that define the condition of both individual agents and the overall system at any given time. This includes an agent’s internal parameters, such as beliefs, goals, and knowledge, and the shared attributes of the system’s environment. According to LangGraph documentation, state can be broadly categorized into individual agent state (specific to each agent) and global system state (the collective state of all agents and their environment.
Effective state management is fundamental to multi-agent systems. It underpins the ability of agents to make decisions, coordinate, adapt, and collectively function in complex environments. Several patterns are commonly used to handle agent state:
In an ephemeral state pattern, agents do not retain any information once a task is completed. Each new interaction is treated as a stateless operation, which simplifies the system’s architecture and is suitable for short-lived or simple tasks. However, it limits the agent’s ability to provide continuity for longer or more complex workflows.
Example: After the Router Agent finishes suggesting configuration improvements, it immediately discards the retrieved router configuration. Thus, no historical information is retained for future interactions.
For agents that must maintain context over multiple interactions, a persistent state pattern is used. Agent data is stored in external databases or in-memory data stores, allowing the agent to recall previous information for subsequent tasks. This pattern is especially useful in multi-turn dialogue systems, where retaining context across multiple conversations is essential.
Example: If the Router Agent offers a chat interface in which the user can ask follow-up questions about the router configuration across multiple sessions, the configuration data would be stored in a database. This allows the agent to retrieve and reference the information as needed, even long after the initial interaction.
A state caching pattern provides a hybrid approach, storing data only for the duration of a session and discarding it afterward. This strategy can boost performance by reducing the overhead of frequent reads and writes to an external store while avoiding the complexities of managing long-term data persistence.
Example: In this case, the Router Agent retains the router configuration in memory for the length of a single chat session. The user can pose multiple questions about the configuration during that session, but once the chat ends, the cached data is removed.
Agents can encounter unexpected scenarios such as unresponsive tools, partial failures, or performance degradation. Failsafe strategies ensure that agents remain robust and continue functioning despite these challenges.
In a fallback strategy, agents have backup options when they fail to complete a task. For example, if an agent encounters an error while invoking a tool, it can switch to a backup tool or attempt a retry. This approach keeps the system resilient even when individual components fail.
Note: In practice, LLMs sometimes call the wrong tool due to reasons beyond the scope of this blog. Nevertheless, it is crucial to detect, and if possible, recover from such issues without user intervention.
When an agent experiences a partial failure or reduced performance, it should continue to operate in a limited capacity. This ensures that even if optimal functionality is not possible, the agent still provides a meaningful output rather than stopping altogether.
Example: Long-running requests can cause an LLM to take several minutes to generate a response. Strategies to handle this situation include:
- Streaming partial results to the user as they become available (OpenAI Streaming API).
- Prompting the user to decide whether to wait for completion or modify the query.
To prevent agents from becoming stuck when tools are slow or unresponsive, timeouts should be enforced on tool executions. If a tool fails to respond within a specified time, the agent can either retry or switch to a fallback tool.
Note: Timeout management is a well-known practice in network programming and is especially important for agents providing chat interfaces. Rate limits, token limits, and external API constraints (e.g., Spotify or Netflix rate limits) underscore the need for robust timeout and recovery mechanisms.
Agents often operate in environments where task conditions or goals may change in real-time. To adapt to these changes, agents must be able to reassign goals dynamically based on new inputs or external factors. Two main patterns are used for dynamic goal reassignment:
To illustrate how goal reassignment operates in practice, let’s consider some examples.
Example 1: Real-time data analysis
Scenario: A user chats with a “DataBot” to quickly analyze a CSV file of daily sales.
Why it matters: Within a single conversation, the agent repeatedly adapts to new objectives and tool failures, ensuring the user quickly gets the desired output without breaking continuity.
Example 2: Quick restaurant reservation and event planning
Scenario: A user employs a personal “evening out” agent to plan a date night within minutes.
Why It matters: In the span of a few messages, the agent’s goals pivot from booking Italian food to sushi, then to finding a movie. Task delegation to a specialized agent happens instantly, all in one session.
Example 3: Live coding assistant
Scenario: A user interacts with a coding assistant to generate, debug, and refactor code in a single chat.
Why it matters: All this happens in one conversation, demonstrating real-time adaptation to new goals and the handing off of tasks that require specialized capabilities.
Example 4: Live weather and travel assignments
Scenario: A user interacts with a “Weather & Travel Bot” to plan a same-day trip.
Why it matters: The user changes objectives multiple times in a short interaction, showcasing how an agent must pivot instantly rather than following an extended, multi-day planning pipeline.
Key Takeaways:
- Goal reevaluation can happen mid-conversation, letting agents instantly adapt to user-driven changes or tool failures.
- Task delegation doesn’t have to be part of a large-scale, multi-agent system, it can occur in brief sessions, when a specialized micro-agent or service is needed to handle a subtask.
- Even short user interactions can benefit from dynamic goal reassignment, ensuring that agents remain flexible and responsive in real time.
Agent-oriented design patterns are essential tools for developers looking to fully understand and master agentic systems. By leveraging patterns for tool orchestration, state management, and fail-safe design, and dynamic goal reassignment, developers can create scalable systems.
This blog is part of our series, Agentic Frameworks, a culmination of extensive research, experimentation, and hands-on coding with over 10 agentic frameworks and related technologies. Explore more insights by reading other blogs in this series:
References and sources
Get emerging insights on innovative technology straight to your inbox.
Discover how AI assistants can revolutionize your business, from automating routine tasks and improving employee productivity to delivering personalized customer experiences and bridging the AI skills gap.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.