DynamoDB table/index schema design for querying multi-valued attributes

Question

I'm building a DynamoDB app that will eventually serve a large number (millions) of users. Currently the app's item schema is simple:

{ 
  userId: "08074c7e0c0a4453b3c723685021d0b6",  // partition key
  email: "foo@foo.com",
  ... other attributes ...
}

When a new user signs up, or if a user wants to find another user by email address, we'll need to look up users by email instead of by userId. With the current schema that's easy: just use a global secondary index with email as the Partition Key.

But we want to enable multiple email addresses per user, and the DynamoDB Query operation doesn't support a List-typed KeyConditionExpression. So I'm weighing several options to avoid an expensive Scan operation every time a user signs up or wants to find another user by email address.

Below is what I'm planning to change to enable additional emails per user. Is this a good approach? Is there a better option?

Add a sort key column (e.g. itemTypeAndIndex) to allow multiple items per userId.

{ userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key itemTypeAndIndex: "main", // sort key email: "foo@foo.com", ... other attributes ... }

If the user adds a second, third, etc. email, then add a new item for each email, like this:

{ userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key itemTypeAndIndex: "Email-2", // sort key email: "bar@bar.com" // no more attributes }

The same global secondary index (with email as the Partition Key) can still be used to find both primary and non-primary email addresses.
If a user wants to change their primary email address, we'd swap the email values in the "primary" and "non-primary" items. (Now that DynamoDB supports transactions, doing this will be safer than before!)
If we need to delete a user, we'd have to delete all the items for that userId. If we need to merge two users then we'd have to merge all items for that userId.
The same approach (new items with same userId but different sort keys) could be used for other 1-user-has-many-values data that needs to be Query-able

Is this a good way to do it? Is there a better way?

DynamoDB table/index schema design for querying multi-valued attributes

Answers (1)

Related Questions