A non-code primer introducing the rise and design of LLM powered commerce agents.
John Sosoka, AI Development Lead
“The future is already here, just unevenly distributed.” - William Gibson
Imagine signing into an eCommerce platforms merchant center and tasking an AI with performing some market research & making pricing recommendations against your catalog. That Agent asks you a few clarifying questions and then proceeds to generate and delegate specific tasks to other Agents. While you refill your morning cup of coffee, a team of semi-autonomous bots start crawling the web aggregating information, querying your product catalog for pricing information, and benchmarking your pricing strategy against competitors. By the time you're seated back at your desk, you have a comprehensive report waiting for you to review complete with cited sources & actionable insights.
This example is less about competitor price analysis, and more about the ability to spawn a team of entities with access to different tools to accomplish a variety of unique tasks, such as browsing the web, querying your product catalog, or writing results to a Google doc. With a similar toolkit you could task an Agent to review your own e-commerce website, perhaps comparing the product descriptions on the website with a document provided by a vendor, ensuring that it was published and rendering properly.
This vision isn’t a glimpse into the near future, these multi-actor systems can be built today. While the concept of building and designing agents has existed for some time, recent technological breakthroughs have accelerated the pace of agent development and application.
Large Language Models
Large Language Models (LLMs) burst to the forefront of the conversation in the tech world overnight. You’ve probably even interacted with one like ChatGPT or Google’s Gemini, but their use cases extend far beyond a chat conversation. Try opening chatGPT in a separate tab and assign it with a task and some imagined tools to accomplish the objective. You will likely see a reasonable plan articulated (example below). As impressive as the outputted text may be, it does not make the Language Model an agent. Consider the prompt & response in the expander below.
Prompt:
You need to assist a customer who ordered an item some time ago. They do not know the tracking number, but they have the order ID, which is order ID 0912384.
Create a brief plan. For each entry in your plan, have 1 field representing only the tool invocation and another with a simple reason, expectation or explanation.
You have access to the following tools:
orderLookup(orderID) - Returns order details, status, tracking number
trackingLookup(trackingNumber) - looks up a tracking status
paymentStatus(orderID) - lookup the payment status for an order
escalateToHuman(message) - When uncertain, ask a human for assistance.
Response:
Step 1
Tool Invocation: orderLookup(orderID: "0912384")
Reason: To retrieve the order details, status, and tracking number for order ID 0912384. This is the first step to understand the current status of the order and to identify the next steps.
Step 2
Tool Invocation: trackingLookup(trackingNumber: "<retrieved from step 1>")
Reason: With the tracking number obtained from the order lookup, this step aims to check the current tracking status of the order. It provides detailed information on the shipment's progress.
Step 3
Tool Invocation: paymentStatus(orderID: "0912384")
Reason: To confirm the payment status of the order. This ensures there are no payment-related issues contributing to any order fulfillment delays.
Step 4
Tool Invocation: escalateToHuman(message: "Order ID 0912384 requires further investigation due to <specific reason>.")
Reason: If any discrepancies, uncertainties, or issues that cannot be resolved through automated tools are identified, this step involves escalating the matter to a human for personalized assistance.
Agents
“An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future.” - Franklin and Graesser (1997)
While this next point might seem pedantic, it is critical to understand. An LLM does not qualify as an agent, because it is unable to interact with or sense its environment, therefore it cannot assign itself a task & interact with its environment. An LLM not actually “in'' an environment; it’s simply text in and text out. Luckily, we can bolt onto this text in & out with (artisanal, hand crafted) software, equipping the agent with tools to utilize and information about its environment.
By writing software that injects additional information into the prompt, like a backstory or purpose (your job is to pass the butter!) and add memory–like a chat conversation history, or a list of previous actions and outcomes, we can expand the LLM’s capabilities, effectively turning it into an Agent. Subroutines can prompt the LLM to analyze data, formulate plans, and revise them. We can even write code that stores and retrieves “memories” that are generated over the course of execution.
There are many countless ways to use an LLM as a brain for an Agent. A rapidly growing number of projects are exploring different ways to build both solo and cooperative agents. For instance, Microsoft has been working on Autogen, which allows multiple Language Models to collaborate and solve complex problems. Another project, Crew-AI, streamlines the creation of teams comprised of simple Agents, which can collaborate to complete various tasks. Hundreds of similar project exist, all pioneering distinct and innovative ways to build agents.
To be clear, there is not a single correct way to build an Agent. A particularly insightful paper was published, “A Survey on Large Language Model based Autonomous Agent” where researchers surveyed a large number of LLM-based Agent projects, and identified common patterns from which they proposed a unified framework for building LLM based Agents.
The Unified Framework for LLM Based Agents
At a high level, the Unified Framework is fairly straightforward and consists of only four fundamental modules or components–(see image below) which, when implemented can create an LLM-powered Agent capable of sensing and operating within an environment (Remember that an LLM is not an Agent until its capabilities are extended via software.)
Each of these modules can be implemented in a variety of ways. We will explore each module by thinking through the design of a simple shopping assistant.
Simple Shopping Assistant
Major companies like Walmart and Instacart have been some of the first to try integrating LLMs and proto-llm-powered-agents into their commerce platforms. A shopping assistant is easy to conceptualize and straightforward to build, which makes it an ideal choice to explore the unified framework.
Behavior: Create a chat-based shopping Agent which can help customers plan meals and shop on a website. This Agent should be able to query the product catalog to discuss products, & form meal plans, and interact with the cart
Profile Module
The Profile module is essentially the Agent’s identity. Who it is, how it operates, what its high-level goals are, etc. If we were defining a shopping agent, we would use the “Handcrafting” profiling method and simply define the Agent profile by hand.
“You are a helpful grocery shopping assistant for an online grocery store, tasked with helping customers explore recipes & shop for food.”
In a real-world deployment, the profiling module would be a bit more complex, blending static portions of the Agents high-level objectives with dynamic environment data.
Memory
The Memory would be in-context chat and tool execution output. Memory management for Agents can become very complex, but for the sake of this exercise imagine an array list with ordered text messages. Memory could also include something like a profile lookup to find additional customer information. This type of information is separate from the Agent executing a tool to query a database; Often it is “front-loaded” into the Agent’s context before it has executed any tools. Future blog posts will dive into more details regarding Agent memory management.
Planning
When humans are given a task, we must come up with a plan to execute it, which is dependent on the task’s complexity. There is a growing body of planning strategies for LLMs like Chain-of-Thought and the recently published SELF-DISCOVER algorithm developed by DeepMind, which helps Agents generate a unique problem-solving plan for a given task.
Our simple shopping assistant use-case would involve single-path reasoning for the Agent to determine, “The customer needs xyz, so I must search for xyz.” In more complicated scenarios, multiple agents could work together to plan and refine tasks before executing.Alternatively, an Agent could be tasked with making multiple passes against a plan, further refining it each time until it is actionable.
Action
The Action module houses a variety of tools, allowing the Agent to interact with its environment. These could expose the Agent to a search/browse API or different cart operations. The action space is where Commerce Architects shine, we build commerce systems for a living–the Action space for an e-commerce agent is our home turf!
Once all of these modules have been implemented, an interactive shopping agent comes to life.
Here at Commerce Architects, we have built several LLM powered shopping assistants and would be happy to demonstrate. We are able to leverage our years of experience building e-commerce systems to create a very resilient & comprehensive Action module.
Modules in Action
With the above modules implemented, our new commerce agent is ready to run. But how do all of these modules come together to create an agent?
1. Conversation starts.
When the conversation begins, an LLM will initially be given content from the Profile module. This tells the LLM who it is, what it is doing and what its limitations are.
Next, the LLM will be presented with the available tools from the Action module. This will inform the model of potential actions it can take (add to cart, browse, etc).
2. Conversation Continues / Tool Usage:
The conversation can progress in a variety of ways. Over time, the Agent may be given tasks that it can execute on with tools. For example, “Please help me find thanksgiving side dishes.” The Agent will determine sides options with pre-existing knowledge and then query a search tool to get up-to-date product information from the catalog. One of the functions of an LLM Integration framework is to translate text output from a language model into a method or tool invocation within our software
3. Memory
The Agent can access the ongoing conversation over the course of different operations, using human feedback within the conversation to inform its choices over time. For the purposes of this example, the memory window cannot exceed the LLM context window, though in more sophisticated applications, memory can be further extended.
All of the modules then come together to build a complete Agent. In summary, the Profiling module tells the LLM who it is and how it operates in the world, the Planning module enables it to formulate plans, the Action module provides inputs and a mechanism for interacting with the environment, and the Memory module gives the model a sense of where it has been and what it has tried. These four elements are the ingredients to create a semi-autonomous software entity.
Business-Facing Commerce Agents
The very same principles for creating a simple shopping agent as outlined above, can be used to build some of the business-facing agents that we imagined at the beginning of the article. The Profiling module might instead place the bot on a merchandising team, onboarding products instead of helping a customer shop for Thanksgiving dinner. The Action modules and Action space would be entirely different as these business-facing agents would need to be exposed to other tools such as warehousing, or merchandising tools.
We have been building and integrating with these types of e-commerce systems long before LLM-Powered Agents, which makes us uniquely qualified to help create and integrate custom Agents for your business needs.
“Although generative AI burst onto the scene seemingly overnight, CEOs and other business leaders can ill afford to take an overly cautious approach to introducing it in their organizations. If ever a business opportunity demanded a bias for action, this is it.” ( Source: The organization of the future: Enabled by gen AI, driven by people )
Wrap Up
We have covered quite a bit of ground today. From the differences between an LLM and an Agent all the way to high-level Agent design and construction. Hopefully, you now have a clearer understanding of how this new technology is evolving.
As the boundaries of what’s possible expand, you can begin considering how you might design and leverage Commerce Agents in your business. Be it a front-facing chat assistant helping customers plan a dinner party; or a business facing Agent assisting with vendor interaction and product onboarding, the potential applications are vast.
Stay tuned for upcoming posts as we continue to dive deeper into the more technical aspects of Commerce Agent design. If you have a business problem that you think an Agent based model would help solve, please reach out and let's talk about how we can help you leverage this technology.
Comments