NEBULUM.ONE

Old Prompt vs New Prompt System: Building Better AI Workflows

Discover how multi-agent AI systems outperform single-stage prompting. Learn to build intelligent AI orchestrators that route tasks to specialized agents for superior business results and precision.

Read Post

The Future of AI: From Single Prompts to Intelligent Multi-Agent Orchestration

The majority of AI users rely on what’s called single-stage prompting, where they craft prompts with care, generally using some type of prompt engineering structure or template that outlines the task, context, role, voice, and examples for the desired output format.

While knowing how to give instructions to an LLM is very useful, it’s only part of a much bigger process. Even those who are best at communicating their instructions to AI are still just communicating directly to a single general-purpose LLM.

The problem with this approach is that even the most sophisticated users are simply engineering their prompts as inputs and then relinquishing control to the general-purpose model after that.

While this approach is fine for most consumer AI use cases or simple applications, it’s generally not enough for more specialized business use cases.

So in this tutorial, I’ll show you how we can change our prompt pipeline from traditional single-stage prompting to advanced multi-agent systems.

Traditional Single-Stage Prompting vs Multi-Agent AI Systems

Traditionally, if you wanted your LLM to do many different things, you had two main options. You could ask a general-purpose AI and get decent results. This is what most people do today, but the problem with this approach is that it’s not optimized for any individual workflow. This is a jack-of-all-trades approach. Your AI can probably accomplish the task, but the outputs, although likely decent, will lack precision.

The alternative approach is that you would need to silo those different tasks as different processes. For example, one AI agent for creating images, another for sending emails, another for coming up with social posting ideas, another to help you strategize or research. While this would give you the desired results for each task, it creates additional complexity because now the user would need to access each of these agents as individual tools. So you’d have one tool with its own UI to do research, another tool to help you code, another tool to generate images, and so on.

An alternative approach would be to de-silo these agents and bring multiple tasks into one pipeline and then route the task using an orchestrator agent. The benefit of taking this approach is that although it requires more time to set up, you’ll generate substantially stronger results, and multiple agents, tools, and processes can be accessed through a single pipeline.

Why Multi-Agent AI Systems Work: The Human Analogy

Looking at this workflow, think about how you naturally switch between different roles throughout your day—you might write professional reports at work, then tell funny stories with friends later that evening. You don’t need to become a different person; you just adapt internally based on the context.

This new AI approach works the same way: instead of having separate agents for different tasks, we create one intelligent system that can internally compartmentalize and switch between specialized modes—like having a social media expert, a coding specialist, and a strategy consultant all within the same agent, each with their own expertise and voice.

How AI Orchestrator Agents Transform Task Management

One way to mimic this logic is to use an orchestrator agent. An orchestrator agent is simply an AI control system that receives incoming requests and intelligently routes them to the most appropriate specialized AI component or pipeline based on the nature of the task.

What’s fascinating about this new approach is that we can get very granular with our control over each step of the process.

Previously, using a single-stage prompt, the onus was on the user to optimize the prompt to achieve the desired output. However, with this new approach, we can bake a lot of the prompt engineering into downstream components. For example, if the orchestrator agent determined that the input was requesting a social media post, the request could be routed to the social media pipeline.

From here, the AI agent who takes on this request would be the expert in this particular task. In this case, they would have a baked-in system prompt which would define their voice and their governing rules. For example, perhaps in the system prompt, the agent is told the content pillars it must cover. Perhaps the agent would also already have access to optimized content formats or examples. As systems designers, this means that the initial prompt doesn’t have the same prompt engineering expectations since much of that engineering is happening within these downstream internal components which have been pre-optimized to fulfill their tasks.

These components could even use prompt rewriters or query reformulation: components that take a user input, and make the input itself stronger. For example, these components could transform a vague input into a well-structured, detailed query that would yield a more accurate AI response.

Advanced Pipeline Components for Intelligent AI Processing

What’s also interesting about this approach is that the agent doesn’t need to immediately generate a response.

Intelligent Classification Systems

Perhaps our response would improve if the AI augmented the user input with some type of external data. But before we retrieve that data, we might want to know if data retrieval is even necessary. So before we even attempt retrieval, we could use a classifier. For example, If someone just says “hello,” there’s no need for RAG (Retrieval Augmented Generation). This classifier acts as a gatekeeper, ensuring we only invoke expensive retrieval processes when they’ll actually improve the output. This saves computational resources and reduces latency for simple queries.

If retrieval is necessary, we can now branch off into either a traditional database or even a vector database using rich metadata or attributes to pull relevant data from these private data sources.

From there, the request could go to a reranker, which would determine the priority and importance of any private data gathered.

The Critical Role of AI Rerankers

The reranker is crucial because initial retrieval often returns relevant but unordered results. The reranker takes these results and scores them based on relevance to the specific query, recency, authority, and other contextual factors. This ensures that the most valuable information gets prioritized in the final response.

Also, if any irrelevant data gets added here, the reranker can remove that data altogether.

Next, the input would move to a component which would rewrite the prompt to be optimized for its execution for its final processing.

Specialized LLM Endpoints for Maximum AI Performance

Then from there, it could hit a custom-trained model that would be optimized to handle the request. Each of these LLM endpoints could be entirely different LLMs trained on entirely different datasets. So the social media expert isn’t just prompt-engineered into being a social media expert—the social media expert is baked into their model through fine-tuning. This means that each endpoint is optimized for the task at hand.

For example, this social media pipeline might be governed by Meta’s Llama LLM, the strategy pipeline might use ChatGPT, and the coding pipeline might use Claude. We would do this because these models score differently on different benchmarks. So we could optimize the routing to take advantage of specific LLMs who have high performance working with specific tasks. From the user’s standpoint, they are simply entering a prompt into an input field and don’t need to worry about branching logic or model switching—All this complexity is simply happening behind the scenes based on what they requested.

Another major advantage to sending the request to a specialized LLM is that we don’t need to risk “lost in the middle” problems by relying on large context windows. Now we can use small, focused context windows with high levels of contextual relevance because this request is going to a specialized LLM. Instead of stuffing everything into one massive prompt and sending that off to a general purpose LLM, and hoping the model pays attention to the right parts, we deliver precisely the right information at the right time to specialized components.

The Revolutionary Results of Multi-Agent AI Systems

With this new system, the components are designed to work harmoniously together to complete one task exceptionally well with high levels of accuracy. From the user’s standpoint, you get away from having to use a jack-of-all-trades general-purpose model for your diverse set of needs.

So if this all sounds interesting to you and you’d like to learn how to build out AI architecture like this without having to know how to code, be sure to check out our no-code AI development course. In our course, you’ll learn how to create virtually any AI application you can dream up, from AI software platforms to AI agents.

Transform Your Business Into a Self-Running System

Build automation architecture, intelligent agents, data infrastructure, and autonomous workflows that eliminate repetitive work. From prototype to scaled systems in weeks, not years.

Learn More

Our Expertise

We build with heart