Joshua Foxworth
Joshua Foxworth

Reputation: 1397

DynamoDB - GSI versus duplication

I have a question about many-to-many relationships within DynamoDB and what happens on a GSI versus shallow duplication.

Say I want to model the standard many-to-many within social media : a user can follow many other pages and a page has many followers. So, your access patterns are that you need to pull all the followers for a page and you need to see all the pages that a user follows.

If you create an item that has a primary key of the id of the page and a sort key of the user id, this lets you pull all followers for that page.

You could them place a GSI on that item with an inverted index. This would like you call all pages a user is following.

What exactly is happening there? Is DynamoDB duplicating that data somewhere with the keys rearranged? Is this any different that just creating a second item in the table with a primary key of the user and the sort key of the page?

So, you have this item:

Item 1:
PK                       SK
FOLLOWEDPAGE#<PageID>    USER#<UserId>

And you can create a GSI and invert SK and PK, or you could simply create this second item:

Item 2:
FOLLOWINGUSER#<UserId>   PAGE#<PageID>

Other than the fact that you now have to maintain this second item, how is this functionally different?

Does a GSI duplicate items with that index? Does it duplicate items without that index?

Upvotes: 0

Views: 980

Answers (1)

fedonev
fedonev

Reputation: 25819

Is DynamoDB duplicating that data somewhere with the keys rearranged?

Yes, a secondary index is an opaque copy of your data. As the docs say: A secondary index is a data structure that contains a subset of attributes from a table, along with an alternate key to support Query operations. You choose what data gets copied (DynamoDB speak: projected) to the index.

Is this any different that just creating a second item in the table with a primary key of the user and the sort key of the page?

Apart from the maintenance burden you mention, conceptually they are similar. There are some technical differences between a Global Secondary Index and DIY replication:

  • A GSI requires separate provisioned concurrency, although the read and write units consumed and storage costs incurred are the same for both approaches.
  • A GSI is eventually consistent.
  • A Scan operation will be ~2x worse with the DIY approach, because the table is ~2x bigger.

See the Best practices for using secondary indexes in DynamoDB for optimization patterns.

Does a GSI duplicate items with that index?

Yes.

Does it duplicate items without that index?

No.

Upvotes: 2

Related Questions