top of page
Search

5 DynamoDB Concepts Essential for Effective Data Modeling

Alex Thornton, Software Engineer II


In modern cloud applications, scalability is paramount. DynamoDB is engineered to support workloads at any scale, handling millions of requests per second and terabytes to petabytes of data. However, poorly optimized data access patterns in DynamoDB can lead to expensive table scans and application-side joins, which heavily degrade performance and scalability. In some cases, these inefficiencies can make DynamoDB less effective than traditional SQL databases, which are inherently optimized for storage over compute. By designing data models around predictable access patterns and eliminating inefficient joins, you can leverage DynamoDB’s strengths for faster, more efficient, and cost-effective data access. With the right approach, DynamoDB enables high-performance, scalable applications that handle vast datasets and high-throughput demands.


For example, during the COVID-19 pandemic, Zoom saw a sudden surge in demand for their services, with daily meeting participants growing from 10 million to 300 million. By utilizing DynamoDB’s global tables in conjunction with on-demand mode, they were able to scale nearly infinitely with no performance issues, even under unprecedented usage spikes.


Getting the most out of DynamoDB comes from knowing how to leverage its unique capabilities effectively. In the following sections, we’ll dive into real-world examples that show how to effectively use composite keys, key overloading, and data co-location to model complex relationships with ease. We’ll also explore advanced features like secondary indexes and DynamoDB Streams, which allow you to support multi-attribute querying and maintain data integrity. Together, these strategies enable you to efficiently handle diverse and high-demand data patterns, optimizing DynamoDB to meet the needs of your applications.


This blog article will define four strategies for optimizing data access in DynamoDB as well as highlighting one very valuable capability that is specific to DynamoDB. This will help provide a clear approach to understanding how to design a DynamoDB table for the best performance in your architecture.


Use Composite Keys for Flexible Queries

In DynamoDB, a primary key uniquely identifies each item and is required for every record in the table. There are two types of primary keys: simple primary keys and composite primary keys.


A simple primary key consists of a single unique attribute known as the partition key. The partition key plays a critical role in DynamoDB's architecture, determining how data is distributed across partitions. When designed effectively, it helps ensure even data distribution. By spreading data across multiple partitions, DynamoDB can handle high-throughput workloads efficiently, avoiding bottlenecks caused by "hot" partitions. This distribution mechanism enables scalability and performance, whether you’re using a simple primary key or a more advanced key structure. While a simple primary key works well for data that is always retrieved by a unique key, it’s limited to operations on individual items.


For more complex access patterns, a composite primary key is far more versatile. It combines a partition key with a sort key, where the uniqueness is determined by the combination of both. This structure greatly enhances the capabilities of DynamoDB, enabling it to support more sophisticated data modeling and access patterns. Multiple items can share the same partition key, as long as their sort keys differ, or vice versa. This allows you to retrieve sets of related data efficiently, aggregate items, and search within ranges, all critical for handling relational-style data in a NoSQL system.


The true power of composite keys lies in their ability to allow for queries beyond simple key-value lookups. The sort key orders data within each partition, meaning that retrieving a single item is still efficient. However, composite keys also enable more advanced operations like fetching all items within a partition or performing range-based queries on those sets using operators like BETWEEN, <, >, etc. You can also use the sort key to filter items that match a certain prefix, which opens up even more possibilities for structuring your data.


Note: In DynamoDB, the BETWEEN operator can be used with dates, strings, and numbers. Dates, which are typically stored as ISO 8601 strings, also support comparison operators (<, >, <=, >=), allowing for flexible range queries.


As we dive deeper into more advanced data modeling strategies, you’ll see how composite keys are essential for designing flexible, scalable systems in DynamoDB.


Now that we’ve explored the fundamentals of composite keys in DynamoDB, let’s look at how these concepts come to life through a few examples. The following is a representative table and associated data that we will use to consider how we define keys.


In this table, we store order information where:

  • Partition key (userId) represents the user who placed the order.

  • Sort key (date) represents the date and time of the order.

userId (pk)

orderId (sk)

details

DEK-175

2023-12-07T11:00:05

User DEK-175 order on 2023-12-07

DEK-175

2024-05-14T16:20:43

User DEK-175 order on 2024-05-14

DEK-175

2024-05-27T11:41:38

User DEK-175 order on 2024-05-27

DEK-175

2024-07-01T06:27:35

User DEK-175 order on 2024-07-01

LTW-892

2024-06-27T21:22:04

User LTW-892 order on 2024-06-27

LTW-892

2024-06-27T13:30:57

User LTW-892 order on 2024-06-27

GEM-506

2024-01-30T18:39:19

User GEM-506 order on 2024-01-30

GEM-506

2024-09-24T02:24:55

User GEM-506 order on 2024-09-24

Query Examples:

  1. Retrieve All Orders for a User

To retrieve all orders placed by user LTW-892, you can use a simple query:

pk = "LTW-892"

This will return both of their orders, sorted by the date in the sort key.


  1. Retrieve Orders for a Specific Month (begins_with)

If you want to retrieve all of user DEK-175’s orders from May 2024, you can use a query like this:


pk = "DEK-175" AND sk BEGINS_WITH "2024-05"


This query efficiently fetches all orders placed by DEK-175 during May 2024.


  1. Retrieve Orders Within a Specific Date Range (BETWEEN)

To retrieve orders placed by DEK-175 within a specific range, for example, between May 14, 2024 and July 1, 2024, you could use the BETWEEN operator:


pk = "DEK-175" AND sk BETWEEN "2024-05-14T00:00:00" AND "2024-07-01T23:59:59"


This would return all orders placed in that time frame.


  1. Single Item Lookup

If you need to look up a specific order, say for user GEM-506 at a specific timestamp, you can use both the partition and sort keys:


pk = "GEM-506" AND sk = "2024-01-30T18:39:19"


This query will return the exact order details.


Overloading Keys for Multiple Use Cases

Now that you’re using a composite key consisting of a partition key and a sort key, you can overload these keys to support even more access patterns. Key overloading involves appending additional attributes or context to the keys, allowing greater flexibility in how your data is accessed and retrieved.


For the partition key, overloading is particularly useful when dealing with multiple types of items in a single table (a common strategy in DynamoDB). By appending the item type to a unique identifier (e.g., user#1234, order#5678), you can distinguish between different entities, while still grouping related data under the same partition key when necessary. This allows for efficient querying of different item types using a single table while maintaining clear boundaries between them.


When it comes to the sort key, overloading enables even more fine-grained control over how items are accessed. You can append various attributes to the sort key (such as types, relationships, or dates) to support complex queries. Overloading the sort key can also allow it to function as a hierarchy where you can filter as broadly or as narrowly as you would like.


Note: While it’s common to include multiple attributes within the sort key for optimized querying, it’s considered best practice to store these attributes separately in their own dedicated fields as well. This approach avoids the need to parse data directly from the keys.


In this table, we store both order and store information where:

  • Order Partition key represents the item type and the userId.

  • Product partition key represents the item type

  • Order Sort key represents the store and time of the order.

  • Product Sort key represents the product categories followed by the product ID

pk

sk

details

order#DEK-175

store#G73HN#2023-12-07T11:00:05

User DEK-175 order on 2023-12-07

order#DEK-175

store#G73HN#2024-05-14T16:20:43

User DEK-175 order on 2024-05-14

order#DEK-175

store#H87JK#2024-05-27T11:41:38

User DEK-175 order on 2024-05-27

entity#product

produce#fruit#apple#sugar-bee#YT81S

Sugar Bee Apple

entity#product

frozen#breakfast#waffels#eggo-waffels#BM3A1

Eggo Waffels

Query Examples:

  1. Using Overloaded sort key to Filter Data

To retrieve all orders placed by user DEK-175 at store G73NH in 2024, you can leverage the overloaded sort key to filter by store ID and year:

    

pk = "order#DEK-175" AND sk BEGINS_WITH "store#G73NH#2024"


  1. Using Overloaded sort key to search in a hierarchy

To retrieve all products in the frozen section or to find all the apples, you can use the hierarchical structure of the sort key:

    

pk = "entity#store" AND sk BEGINS_WITH "frozen"   

pk = "entity#store" AND sk BEGINS_WITH "produce#fruit#apple"


Beyond Composite keys, overloading keys is the next step in designing a table with effective and efficient data modeling that supports an entire application with a single table. This approach makes your queries more efficient and allows DynamoDB to handle increasingly complex data relationships and retrievals with minimal overhead.


Co-Locate Relational Data

To ensure an even distribution of items across partitions, it is recommended to design your partition key with high cardinality. This helps avoid performance issues caused by hot partitions, partitions that receive disproportionate traffic due to frequent access patterns. However, in cases where certain items are frequently accessed together, strategically placing them on the same partition can be advantageous for optimizing performance.


Access patterns that would typically require joining two tables on a shared attribute in a SQL database can still be achieved in DynamoDB, but through a different, and often more efficient, approach: co-locating related data within the same partition. Though this isn’t the only means of simulating a join operation, and in certain use cases others might be a better option, it is a common one. By storing related items together under a common partition key, DynamoDB allows you to retrieve all necessary data with a single, efficient query, eliminating the need for costly application side joins. If you have data that needs to be retrieved at the same time, put it in the same spot. If you needed all the product details and product reviews you could easily get everything in the database with a single request. As we have already been doing in the previous example, if we need to get all of a user's orders, save them on the same partition under that user's id. If we needed all the stores employees store those employee records under the stores id.


To further demonstrate the power of co-locating related data in DynamoDB, consider a scenario where you want to retrieve all products included in a user’s order. In a traditional SQL database, this would likely require joining three tables: Users, Orders, and Products. Each table would need to be queried and joined based on shared attributes, such as user IDs and order IDs, leading to complex queries and increased latency.


In DynamoDB, you can streamline this process by storing all related items together within the same partition. For example, you could use the user ID as the partition key and structure the sort key to include the order and product information, like #order#<order-id>#product#<product-id>. This approach allows you to retrieve all products in a user’s order with a single query, without the need for multiple joins.


You don’t need to store all product details in these items, just enough data for purposes such as a cart preview. When a user clicks on a specific item, a separate query can be made to fetch the full product information. This approach balances storage efficiency with retrieval speed, leveraging some intentional data duplication, which is inexpensive in DynamoDB. The slight redundancy greatly enhances both the simplicity and performance of your queries, making your data model far more efficient and easier to manage.


Use Sparse Indexes to Target Specific Queries

In some cases, your current key structure may limit your ability to query data in different ways. For example, you might want to retrieve store inventory based on different attributes, but your table’s sort key may only allow queries based on product category. To address this, you can create a secondary index with an alternative sort key, such as the product brand.


Secondary indexes—Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI)—provide additional flexibility by allowing you to define alternative partition and sort keys for querying the same data.


  • Global Secondary Indexes (GSI) allow you to specify a new partition key and a new sort key, creating an entirely separate index that can be queried independently of the base table.

  • Local Secondary Indexes (LSI), on the other hand, retain the same partition key as the base table but allow you to specify a different sort key, giving you more query options without needing to restructure your entire data model.


These indexes are particularly useful for modeling many-to-many relationships or supporting access patterns that aren’t possible with your base table’s key structure.


Example: Local Secondary Index (LSI)

Consider a table where product data is stored with a sort key based on product category. If you needed to look up a product by its unique ID, it would be inefficient, as you’d either need to know the entire hierarchy or perform a full table scan. By creating an LSI where the sort key is the product ID, you can optimize queries to directly retrieve products by ID.

Partition Key (PK)

Sort Key (SK)

LSI Sort Key (LSI_SK)

entity#product

produce#fruit#apple#sugar-bee#YT81S

id#YT81S

entity#product

frozen#breakfast#waffles#eggo-waffles#BM3A1

id#BM3A1

With the LSI, querying by product ID (e.g., YT81S or BM3A1) becomes straightforward and efficient.


Example: Global Secondary Index (GSI)

As we’ve already discussed, partition keys group items together, but sometimes you need to group data based on more than one attribute. If a hierarchical partition key doesn’t fit your use case, a GSI can provide the flexibility you need.

pk

sk

order#DEK-175

store#G73HN#2023-12-07T11:00:05

order#GMI-102

store#G73HN#2024-05-14T16:20:43

order#WQE-932

store#H87JK#2024-05-27T11:41:38

For instance, say you need to retrieve all orders for a specific date. Without a GSI, you would have to know every user ID and then query for their orders individually. By creating a GSI with the order date as the partition key and user ID and order ID as the sort key, you can efficiently retrieve all orders for a specific date, regardless of the user.

GSI_PK

sk

order#2024-12-09

user#DEK-175#order#MEI-092

order#2024-12-09

user#GMI-102#order#WIU-188

order#2024-11-29

user#WQE-932#order#PCX-678

This allows you to perform queries like “Get all orders for December 12th” without needing to scan the entire table.


Secondary indexes significantly enhance your ability to query data by enabling alternative access patterns. If you need to optimize queries for specific attributes, LSIs are ideal. For broader querying needs across multiple attributes, ****GSIs provide greater flexibility. Both tools can significantly improve the efficiency and structure of your data retrieval processes.


You share your read and write capacity units with these additional tables so ideally you should work to eliminate the need for these as much as possible with proper table design. If you find yourself using a GSI to build a table that you frequently scan to generate aggregation data there is a better option, DynamoDB streams.


DynamoDB streams

While most of the concepts discussed so far apply broadly to NoSQL databases, DynamoDB Streams are a unique feature that sets DynamoDB apart. Streams extend the functionality of your data model and can significantly reduce the need for repetitive or on-demand compute operations.


DynamoDB Streams capture changes to items in your table—such as INSERT, MODIFY, or REMOVE operations—and make those changes available for processing in near real-time. This allows you to respond to data changes immediately and perform tasks asynchronously without adding extra overhead to your main application. Streams are very powerful as they allow you to trigger any sort of background processes whenever data is updated, from kicking off additional processing, sending notifications, collecting analytics. In relation to data modeling they are particularly useful for pre-computing aggregations and data consistency.


Example 1: Tracking Item Views for Recommendations and Popularity Metrics

Let’s say you have a table where you store each user’s viewed items in their partition. This helps you recommend previously viewed items. However, you also want to track how many times each product has been viewed across all users, which requires storing the total view count at the product level.


  • Whenever a new “viewed item” record is added to a user’s partition (e.g., pk = "user#1234" AND sk = "viewed#product#456"), DynamoDB Streams can capture this event.

  • A process listens to the stream and, upon detecting the new user view, increments a “total views” counter stored under the product’s partition (e.g., pk = "product#456" AND sk = "metadata")


This way, you’re maintaining real-time popularity data for each product without querying the entire table to count views. The process is efficient because the view count update happens asynchronously, without affecting the user’s immediate experience or requiring complex scans.


Example 2: Tracking Inventory Levels

In a grocery store, managing inventory levels in real time is critical to ensure customers can only purchase items that are available. DynamoDB Streams can help you synchronize stock levels across different views and provide notifications when items go out of stock or are restocked.


  • Whenever a change is made to a product inventory (e.g., pk = "product#456" AND sk = "inventory"), DynamoDB Streams captures this event.

  • A process listens to the stream, determines if the inventory is low or 0, and updates a user carts that contain that item (e.g., pk = "user#1234" AND sk = "cart#789"). This event can also trigger a notification to the customer altering them of this change


This approach avoids the need for synchronous calls to update carts when inventory changes for a single item and keeping the data in sync without complex queries.


Example 3: Maintaining Consistency Between Detailed and Preview Records

Lets say for each product you store detailed product information in one record and preview type details in another record like title, price and, rating. In the case a change is made to either of these records the other should be updated to maintain data consistency


  • Whenever a change it made to a detailed product record (e.g., pk = "product#456" AND sk = "details"), DynamoDB Streams captures this event.

  • A process listens to the stream, determines if any relevant detail changed for the preview record, and updates a that record (e.g., pk = "product#456" AND sk = "preview").


In systems where data duplication is common, DynamoDB Streams provide an efficient way to maintain data consistency without causing slowdowns. By capturing changes in real time, DynamoDB Streams enable synchronization across different records and views, ensuring your application reflects up-to-date information. This not only keeps your data consistent but also opens up possibilities for optimizing performance, such as precomputing aggregations to enhance response times. As a result, you can handle high traffic volumes while keeping your application fast, reliable, and in sync with your dataset.


Modeling in dynamoDB does require a paradigm shift, but it offers serious efficiency benefits. By adopting this new approach, developers can optimize performance and scalability. With enough practice and understanding of its principles, DynamoDB modeling can be successfully applied to a wide range of applications, even those with numerous complex access patterns, but It requires a thoughtful design process that aligns closely with application requirements.




References:


ca-logo-white.png

601 E. Riverside Avenue, Suite 530 : Spokane, WA 99202

About  |  Contact  |  Careers  |  Community

Many organizations struggle to build their large-scale ecommerce software systems. We integrate with teams to plan, build, & deliver the right software features. So they’ll know everything works. From the moment users visit their website—until the item arrives at their front door.

bottom of page