AWS DynamoDB Table Design: Store two UserIDs and Details in Table

Question

I'm building an app where two users can connect with each other and I need to store that connection (e.g. a friendship) in a DynamoDB table. Basically, the connection table has have two fields:

userIdA (hash key)
userIdB (sort key)

I was thinking to add an index on userIdB to query on both fields. Should I store a connection with one record (ALICE, BOB) or two records (ALICE, BOB; BOB, ALICE)? The first option needs one write operation and less space, but I have to query twice to get all all connections of an user. The second option needs two write operations and more space, but I only have to query once for the userId.

The user tablehas details like name and email:

userId (hash key)
name (sort key)
email

In my app, I want to show all connections of a certain user with user details in a listview. That means I have two options:

Store the user details of the connected users also in the connection table, e.g. add two name fields to that table. This is fast, but if the user name changes (name and email are retrieved from Facebook), the details are invalid and I need to update all entries.
Query the user details of each userId with a Batch Get request to read multiple items. This may be slower, but I always have up to date user details and don't need to store them in the connection table.

So what is the better solution, or are there any other advantages/disadvantages that I may have overlooked?

EDIT

After some google research regarding friendship tables with NoSQL databases, I found the following two links:

How does Facebook maintain a list of friends for each user? Does it maintain a separate table for each user?

NoSQL Design Patterns for Relational Data

The first link suggests to store the connection (or friendship) in a two way direction with two records, because it makes it easier and faster to query:

Connections:
    1 userIdA    userIdB
    2 userIdB    userIdA

The second link suggests to save a subset of duplicated data (“summary”) into the tables to read it faster with just one query. That would be mean to save the user details also into the connection table and to save the userIds into an attribute of the user table:

Connections:
    #    userIdA    userIdB    userDetails                    status
    1        123        456    { userId: 456, name: "Bob" }   connected
    2        456        123    { userId: 123, name: "Alice" } connected

Users:
    #     userId       name    connections
    1        123      Alice    { 456 }
    2        456        Bob    { 123 }

This database model makes it pretty easy to query connections, but seems to be difficult to update if some user details may change. Also, I'm not sure if I need the userIds within the user table again because I can easily query on a userId.

What do you think about that database model?

Sam Hartman · Accepted Answer

In general, nosql databases are often combined with a couple of assumptions:

Eventual consistency is acceptable. That is, it's often acceptable in application design if during an update some of the intermediate answers aren't right. That is, it might be fine if for a few seconds while alice is becoming Bob's friend, It's OK if "Is Alice Bob's friend" returns true while "is Bob Alice's friend" returns false
Performance is important. If you're using nosql it's generally because performance matters to you. It's also almost certainly because you care about the performance of operations that happen most commonly. (It's possible that you have a problem where the performance of some uncommon operation is so bad that you can't do it; nosql is not generally the answer in that situation)
You're willing to make uncommon operations slower to improve the performance of common operations.

So, how does that apply to your question. First, it suggests that ultimately the answer depends on performance. That is, no matter what people say here, the right answer depends on what you observe in practice. You can try multiple options and see what results you get.

With regard to the specific options you enumerated.

Assuming that performance is enough of a concern that nosql is a reasonable solution for your application, it's almost certainly query rather than update performance you care about. You probably will be happy if you make updates slower and more expensive so that queries can be faster. That's kind of the whole point.
You can likely handle updates out of band--that is eventually consistency likely works for you. You could submit update operations to a SQS queue rather than handling them during your page load. So if someone clicks a confirm friend button, you could queue a request to actually update your database. It is OK even if that involves rebuilding their user row, rebuilding the friend rows, and even updating some counts about how many friends they have.
It probably does make sense to store a friend row in each direction so you only need one query.
It probably does make sense to store the user information like Name and picture that you typically display in a friend list duplicated in the friendship rows. Note that whenever the name or picture changes you'll need to go update all those rows.
It's less clear that storing the friends in the user table makes sense. That could get big. Also, it could be tricky to guarantee eventual consistency. Consider what happens if you are processing updates to two users' friendships at the same time. It's very important that you not end up with inconsistency once all the dust has settled.
Whenever you have non-normalized data such as duplicating rows in each direction, or copying user info into friendship tables, you want some way to revalidate and fix your data. You want to write code that in the background can go scan your system for inconsistencies caused by bugs or crashed activities and fix them.

AWS DynamoDB Table Design: Store two UserIDs and Details in Table

EDIT

Answers (2)

Related Questions