A hands-on introduction to the various components used to build LLM-Driven Agents
John Sosoka, AI Development Lead
In a recent post, we covered concepts in LLM-driven agent design from a 10,000-ft view with no coding required. This article will dive into the basic building blocks of LLM (large language model) applications. These fundamental components can be used in combination to build Agents or enhance applications.
Throughout this article, LangChain4J will be utilized as the example LLM Integration framework as it is the most mature at this time; however, the Spring AI framework is rapidly catching up. Nonetheless, many of the core concepts are framework-agnostic.
This blog post serves as a basic introduction to these building blocks. Subsequent articles will explore these topics in more detail. Let's dive in!
Prompts & Prompt Engineering
In essence, prompts are how we instruct, inform, and use LLMs. Designing and employing prompts is at the core of building LLM applications. A prompt can be as simple as a one-off question to an LLM or as advanced as maintaining key historical data from previous execution runs and injecting specific instructions and examples.
Prompts are a surprisingly vast topic, so we will only scratch the surface in this article.
Prompt Stuffing
Before entering the world of prompt engineering, it is imperative to understand prompt stuffing, also known as "context stuffing." This is the practice of injecting (hopefully) critical information into the LLM context to help the model better solve a task.
There are many prompt stuffing mechanisms, some of which will be covered later in this article.
Prompt Templates
Prompt templates are reusable prompts with fields to input variable data critical to the task at hand. There are various different ways to create a prompt template, and all major LLM integration frameworks provide a mechanism for it.
Let's demonstrate a simple Structured Prompt example for generating a friendly, context-aware greeting message:
@StructuredPrompt({
"You are a friendly chat bot.",
"Generate a friendly greeting message for the user.",
"User Name: {{name}}",
"Last Login: {{lastLogin}}",
"Favorite Topics: {{topics}}",
})
@Builder
public class GreetingPrompt {
private String name;
private String lastLogin;
private List<String> topics;
}
Given the code above, it can be seen how we can inject values into our prompt template. The @StructuredPrompt annotation allows us to inject values into template fields and turn the final result into different message types like an AiMessage, SystemMessage, UserMessage or String.
@SpringBootTest
public class PromptTemplateTests {
@Autowired
ChatLanguageModel chatLanguageModel;
@Test
public void testPromptTemplate() {
GreetingPrompt greetAlice = GreetingPrompt.builder()
.name("Alice")
.lastLogin("2023-11-01")
.topics(List.of("Gardening", "Travel"))
.build();
Prompt alicePrompt = StructuredPromptProcessor.toPrompt(greetAlice);
// Greet Alice
System.out.println(chatLanguageModel.generate(alicePrompt.text()));
System.out.println("=============================================");
GreetingPrompt greetJohn = GreetingPrompt.builder()
.name("John")
.lastLogin("2024-03-01")
.topics(List.of("Backpacking", "Music"))
.build();
Prompt johnPrompt = StructuredPromptProcessor.toPrompt(greetJohn);
// Greet John
System.out.println(chatLanguageModel.generate(johnPrompt.text()));
}
}
When we executed the above test, we were able to re-use the GreetingPrompt for multiple users effectively:
Hello Alice! Welcome back to our chat. It's always great to see you here. I see that you enjoy discussing topics like gardening and travel - two wonderful interests! How can I assist you today?
=============================================
Hello John! Welcome back to our friendly chat platform. It's great to see you again. I hope you're ready for some interesting conversations on your favorite topics of backpacking and music. Let's dive in and have a great chat!
This was a particularly simple example. In more sophisticated use cases, prompts could include specific instructions or guide domain-specific Agents through different tasks.
Foundational Prompt Operations
There are three foundational categories of prompt operations, which encompass all types of prompt operations. Knowing these foundational categories can be helpful when designing your LLM-driven application.
Operation | Description | Model Preferences |
---|---|---|
Generative | Generative prompt operations have larger outputs than inputs, including activities such as planning or brainstorming. | Generally, higher parameter models are more effective with generative operations, particularly for planning. |
Reductive | Reductive operations have outputs that are smaller than the inputs. These operations encompass tasks like summarization, extraction, and critiquing. | Lower parameter models perform fairly well with reductive operations like extraction and summarization; however, higher parameter models are better suited for other reductive operations like critiquing or analyzing. |
Transformative | Transformative prompt operations have nearly the same size outputs and inputs. These operations include reformatting, translation, and refactoring. | Higher parameter models are better for tasks like refactoring while lower parameter models can hold their own with simpler transformations like reformatting. |
Prompt Engineering
Prompt engineering is an emerging field focused on developing and optimizing prompts for utilizing LLMs in a wide variety of use cases. This process goes far beyond simply crafting text or prompt templates for LLMs. Instead, prompt engineering can include writing code to implement more sophisticated strategies (like Retrieval Augmented Generation, covered later in this article). In this section, we will cover two introductory prompt engineering strategies.
Zero-Shot Prompting
Zero-shot prompting is something you may have already done and didn't realize that it had a name. With zero-shot prompting, we simply give the LLM a task with zero examples and hope that it works. A popular example of zero-shot prompting is sentiment analysis:
Example Prompt
Classify the sentiment of the following passage as either neutral, negative, or positive.
Passage: Last weekend, I tried Tango's for the first time. Their empanadas were amazing!
Example Output
positive
Zero-shot prompts are clearly very rudimentary. These prompts are generally the least effective (depending on use case), but it is essential that we can classify strategies both good and bad to build more effective applications.
Few-Shot Prompting
Few-shot prompting is the next step up from zero-shot prompting. With few-shot prompting, we can take advantage of LLM's remarkable capacity for "in-context" learning. While LLMs are pre-trained and immutable outside of the training process, they still have an incredible capacity to apply new information in real-time via in-context learning. A “Shot” is basically an example provided to the LLM.
Consider the following example in which we introduce an imagined word and task the LLM to apply it to a new task:
Example Prompt
A whirlweree is a newly discovered plant that can cause an allergic reaction in some people (similar to something like almonds).
You are tasked with identifying potentially hazardous ingredients in a recipe. Please only return the flagged ingredients from the following list:
- salt, sugar, baking soda, walnuts, whirlweree, banana.
Example Prompt
walnuts, whirlweree
The above example is pretty silly, but few-shot prompting can be enormously powerful when introducing an LLM to new concepts or squeezing more performance out. If you find yourself developing an LLM-driven application and the behavior isn't consistently what you expect, consider employing a few-shot prompting technique to provide some examples and leverage this technology's in-context learning ability.
Structured Outputs & Output Parsers
While there are various starting points for exploring the fundamental components of an LLM application, structured outputs, and output parsers are vital for seamlessly integrating LLMs into your application.
Structured Outputs
A structured output is the LLM’s ability to respond in a way that conforms to a pre-defined schema specification. This is a critical component in building an LLM application. If an LLM can reliably return generated text that conforms to a schema, we can deserialize it into an object within our application and easily hook it into other code processes.
You can experiment with an LLM's capacity to form structured outputs by chatting with one in a web browser. Provide a schema that it must conform its response to and provide a task. Then, check that the LLM returns the generated text to the desired specification.
Output Parsers
Output parsers are modules provided by the LLM integration framework, which sometimes execute before and after the text generation call. The purpose of an output parser is twofold:
1. Force an LLM to conform its response to a predefined spec.
2. Parse the resulting text to your target object and return it for downstream usage in your application.
User Shopping Preference Example
I'll provide a brief example with a user preference extractor. In this demonstration, the goal is for an Agent to parse a user message to a model defined below:
@Data
@ToString
public class UserPreferences {
private String goal;
private List<String> foodPreferences;
private List<String> allergies;
}
We have defined a simple UserPreferenceExtractor with a single method to extract user preferences:
public interface UserPreferenceExtractor {
@UserMessage("Extract user preferences from {{message}}")
UserPreferences extractUserPreferences(@V("message") String userMessage);
}
The @UserMessage annotation is a LangChain4J construct that simplifies defining basic prompts and providing basic instructions to our target LLM.
The UserPreferenceExtractor is initialized and becomes callable via a configuration. This example is using a LangChain4J AiService to create a proxy object from the interface, which we can then invoke:
@Configuration
public class AgentConfig {
@Bean
UserPreferenceExtractor userPreferenceExtractor(ChatLanguageModel chatLanguageModel) {
return AiServices.builder(UserPreferenceExtractor.class)
.chatLanguageModel(chatLanguageModel)
.build();
}
}
Now that our extractor is configured, let's test it out!
@SpringBootTest
public class PreferenceExtractorTest {
@Autowired
UserPreferenceExtractor userPreferenceExtractor;
@Test
public void testPreferenceExtractor() {
String userMessage = """
Hello, I am preparing for a Thanksgiving dinner. One of my guests is allergic to tree nuts.
All of my guests like ham and turkey. I really want to impress them with a delicious meal.
""";
UserPreferences userPreferences =userPreferenceExtractor.extractUserPreferences(userMessage);
System.out.println(userPreferences.toString());
}
}
The resulting output, after calling the toString() method on the UserPreferences model is:
UserPreferences(goal=Impress guests with a delicious Thanksgiving meal, foodPreferences=[ham, turkey], allergies=[tree nuts])
The LangChain4J output parsers provided the UserPreferences schema to the LLM, along with our userMessage and instructions to conform the response to our model. Additionally, the output parsers deserialized the generated text into our target object, making it incredibly easy to weave an LLM response into the rest of our application.
We won't be going into sophisticated use cases in this post, but there are a myriad of ways this could be utilized. For instance, we could use the parsed information to perform product searches to assist the customer. Another possible use case is saving the preference data to persistent storage, giving the user a tailored experience the next time they use our application.
Structured outputs and output parsers are fundamental building blocks in LLM applications. Once we properly utilize them, the possibilities are endless.
Function Calling
Some LLM providers support function calls. Under the hood, function calling depends on behavior similar to the structured outputs discussed above. Function calling ultimately enables a model to request that a certain tool or function be invoked. It is our responsibility (or rather the integration framework's responsibility) to map a tool execution request to a specific piece of code in our application that will be invoked. The function calls will then surface the result of the function invocation back to the LLM so that the model can continue its task.
When a provider officially supports function calls, there will be a reserved field in the API's request/response model that:
1. Exposes a variety of functions/tools that the LLM has access to
2. Represents a function execution request (received from the LLM)
3. Conveys an execution response (the output from the invoked function)
Function calling is essential for building an Agent that can interact with and respond to its environment.
Let's explore a simple function calling example. In the following illustration, we will expose a tool that allows an LLM to look up the status of an order. First, let's stub a few tools for the LLM to utilize. I'll set up a spring component with two tools—one for looking up order details and another for checking tracking information.
@Component
@Slf4j
public class OrderTools {
@Tool("Helpful for looking up order details (including a tracking number) for a given order ID")
public String lookupOrder(Integer orderId) {
log.info("Looking up order details for order ID: {}", orderId);
if (orderId == 2549) {
return "Order 2549 shipped on 2024-05-03 with tracking number XC123456";
} else {
return "Order not found";
}
}
@Tool("Helpful for looking up tracking details for a given tracking number")
public String trackingLookup(String trackingNumber) {
log.info("Looking up tracking details for tracking number: {}", trackingNumber);
if (trackingNumber.equals("XC123456")) {
return """
Tracking number XC123456
Shipped on 2024-05-03
Last location: New York, NY
""";
} else {
return "Tracking number not found";
}
}
}
I've added some logging so that we can gain insights into the functions being called from the log output. The LangChain4J framework makes it incredibly easy to expose tools to an agent. Note that with the @Tool annotation, we can provide more context about how the tool might be used.
Again, we will define another assistant using an interface and a few other LangChain4J annotations to guide behavior:
public interface CustomerServiceAgent {
@SystemMessage("You are a helpful customer service agent, tasked with answering customer questions.")
String chat(String userMessage);
}
Additionally, we need to add another configuration for the newly minted Agent and equip it with tools and memory.
@Bean
CustomerServiceAgent customerServiceAgent(ChatLanguageModel chatLanguageModel, OrderToolsorderTools) {
return AiServices.builder(CustomerServiceAgent.class)
.tools(orderTools)
.chatMemory(MessageWindowChatMemory.withMaxMessages(20))
.chatLanguageModel(chatLanguageModel)
.build();
}
Now we're ready to create a test scenario and see the Agent in action. We will provide a message from a user with an order number and see if the LLM can both form sensible ToolExecutionRequests and integrate the resulting ToolExecutionResponses into its answers.
@SpringBootTest
public class CustomerServiceTests {
@Autowired
CustomerServiceAgent customerServiceAgent;
@Test
public void testFunctionCalls() {
String userMessage = "Hello, I'm trying to find the status and tracking of my order. The order number is 2549";
System.out.println(customerServiceAgent.chat(userMessage));
}
}
Here's the output from the test execution:
2024-05-04T13:12:13.048-06:00 INFO 28968 --- [ main] c.c.c.tools.OrderTools : Looking up order details for order ID: 2549
2024-05-04T13:12:13.912-06:00 INFO 28968 --- [ main] c.c.c.tools.OrderTools : Looking up tracking details for tracking number: XC123456
Your order with order number 2549 has been shipped with tracking number XC123456. It was shipped on 2024-05-03 and the last known location is New York, NY. If you have any more questions or need further assistance, feel free to ask!
We can see the INFO log lines, demonstrating that the LLM could form reasonable requests. First, it looked up the order ID to find a tracking number. Then, the model searched for the tracking number to fetch the latest tracking details. Finally, using all of the data from the function calls, it returns a response that corresponds perfectly to our stubbed tool outputs: the package shipped on 2024-05-03 and was last seen in New York, NY.
Function calls are incredibly powerful when building LLM-driven Agents or LLM applications. These expressions open up many opportunities and lend themselves to various new and interesting patterns, such as nested agents and smart tools, which we will explore in more depth in future articles.
Chat Memory & Context Management
In general, LLM providers do not manage or maintain the conversation context. Each API call is simply text in and text out. This means that it is the client's responsibility to manage the context of the conversation and send everything with each request.
Furthermore, LLMs do not have an unlimited context, which means developers of LLM applications must constantly balance identifying/supplying critical information to the generation model without filling the entire context.
You may have noticed in the CustomerServiceAgent configuration in the previous section the following line:
.chatMemory(MessageWindowChatMemory.withMaxMessages(20))
This line of code equipped our LLM with a LangChain4J OOTB ChatMemory. This example is a very simple implementation that only maintains N messages for the chat context (in our case, 20 messages).
You can envision the memory as an array of messages from the LLM and user. More sophisticated ChatMemory implementations can be created—like having an additional, nested Agent summarize messages as they slide out of the message window and storing summaries on a reserved index, then deleting the original message.
Worth noting is that both ToolExecutionRequests and ToolExecutionResponses occupy the same context space as user messages and LLM responses. When designing your application, pay close attention to how you manage these.
Pagination is a common example I use for dealing with limited context windows, tool executions and function call messages.. If your Agent is using function calls and performing a pagination request, it needs to keep track of what page it is on. If the conversation grows and you delete the original tool messages, the Agent will lose track of which page it called and repeat the same request.
Retrieval Augmented Generation
Retrieval Augmented Generation, or RAG, was first described in this paper. Fundamentally, RAG is the practice of identifying potentially helpful information prior to the LLM generation call and stuffing the prompt with the selected data.
The process or sub-routine identifying relevant information is known as the "retriever." Ideally, the surfaced information will help the LLM better complete the given task.
There are as many ways to implement RAG as there are methods to implement data retrievers; however, the most common type utilizes vector embeddings/vector search.
Vector Search / Retrievers
Vector search is also commonly referred to as "semantic search". Text documents are split into "chunks" and converted into vectors of numbers using embedding models. Embedding models are different from an LLM as they help capture and convert regular language into a vector representation. The returned vectors are then stored in a vector database, where semantically relevant text is clustered in vector space.
For example, if the text "the quick brown fox jumped over the lazy dog" was split into chunks ("the quick", "brown fox", "jumped over", and "the lazy dog"), it is likely that "brown fox" and "the lazy dog" would be stored near each other in vector space as they are semantically relevant (mammals, canines, four-legged creatures, etc.).
Once a vector database has been built with your source text and a user fires a message, that user message is converted into a vector and used to perform a semantic search against your vector database. The returned nearest neighbors are then stuffed into the LLM's prompt to help it answer the user's question. This is largely how the "chat with document" features work with LLMs.
Vector databases are incredibly useful, but they are not the be-all and end-all for joining your LLM application with custom information. They have a variety of shortfalls, including CAP theorem limitations (particularly eventual consistency) as your vector store and source data are maintained independently.
Furthermore, not all related information is semantically relevant. Consider a run-book describing production servers named after Greek mythological figures. A semantic search in this scenario would have a high probability of failing to retrieve relevant text.
Other Retrievers
Simply put, if a mechanism to fetch data exists, it can be adapted into a data retriever. This includes performing searches against a SQL or NOSQL database, making API calls, etc.
A wonderful pattern we have explored at Commerce Architects is using a "smart" retriever, which is essentially an Agent exposed to a variety of tools that grant it access to different data sources. This agentic retriever can critically analyze the input text and choose from various sources to find the best data to surface to the primary Agent.
Chains
The term "chain" in the context of an LLM-driven application is simply a series of sequence calls, which can include an LLM (even an LLM exposed to function calls) data stores or other pre/post-processing steps. In short, you can integrate all of the building blocks covered in this post to create "chains".
The LangChain4J framework has discarded the term "chain" in favor of "AI service," but the concept is important to understand as this term is used frequently.
By applying the fundamental building blocks described in this post, you can build a variety of chains to make a sophisticated LLM Agent, including chains for data extraction, data retrieval, function invocation, and more.
Conclusion
We've covered quite a bit of ground today, from prompts to context management, data retrieval, and beyond! Learning how to weave all of these ingredients together is essential for building a highly performant and helpful Agent or LLM-driven application. Now you should have a solid understanding of core terms, concepts, limitations, and use cases for these building blocks in this emerging field. In subsequent articles, we will discuss more advanced topics in depth, like flow-control and multi-agent orchestration.
Happy coding!
Comments