Reputation: 46763
I'm building a DynamoDB app that will eventually serve a large number (millions) of users. Currently the app's item schema is simple:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
email: "foo@foo.com",
... other attributes ...
}
When a new user signs up, or if a user wants to find another user by email address, we'll need to look up users by email
instead of by userId
. With the current schema that's easy: just use a global secondary index with email
as the Partition Key.
But we want to enable multiple email addresses per user, and the DynamoDB Query
operation doesn't support a List
-typed KeyConditionExpression
. So I'm weighing several options to avoid an expensive Scan
operation every time a user signs up or wants to find another user by email address.
Below is what I'm planning to change to enable additional emails per user. Is this a good approach? Is there a better option?
itemTypeAndIndex
) to allow multiple items per userId
. {
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "main", // sort key
email: "foo@foo.com",
... other attributes ...
}
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "Email-2", // sort key
email: "bar@bar.com"
// no more attributes
}
The same global secondary index (with email
as the Partition Key) can still be used to find both primary and non-primary email addresses.
If a user wants to change their primary email address, we'd swap the email
values in the "primary" and "non-primary" items. (Now that DynamoDB supports transactions, doing this will be safer than before!)
If we need to delete a user, we'd have to delete all the items for that userId
. If we need to merge two users then we'd have to merge all items for that userId
.
The same approach (new items with same userId
but different sort keys) could be used for other 1-user-has-many-values data that needs to be Query
-able
Is this a good way to do it? Is there a better way?
Upvotes: 0
Views: 614
Reputation: 578
Justin, for searching on attributes I would strongly advise not to use DynamoDB. I am not saying, you can't achieve this. However, I see a few problems that will eventually come in your path if you will go this root.
Thus with increasing use-case on search criteria, this solution will easily become a bottle-neck for your system. As a result, your system may not scale well.
To best of my knowledge, I can suggest a few options that you may choose based on your requirement/budget to address this problem using a combination of databases.
Option 1.
DynamoDB as a primary store and AWS Elasticsearch as secondary storage [Preferred]
Now in your application, use DynamoDB for fetching user records from id. For all other search criteria(like searching on emailId, phone number, zip code, location etc) fetch the records from AWS Elasticsearch. AWS Elasticsearch by default indexes all the attributes of your record, so you can search on any field within millisecond of latency.
Option 2.
Use AWS Aurora [Less preferred solution]
If your application has a relational use-case where data are related, you may consider this option. Just to call out, Aurora is a SQL database. Since this is a relational storage, you can opt for organizing the records in multiple tables and join them based on the primary key of those tables.
I will suggest for 1st option as:
Having said that, now I will leave this up to you to decide. 😊
Upvotes: 1