Joey Yi Zhao
Joey Yi Zhao

Reputation: 42596

How to do global query on dynamodb table?

Dynamodb is a name - value database and query requires a partition key. I am saving user data in Dyanmodb table but I'd like to know what the best way to do a global search.

My table includes these fields:

id (PK)
firstName
lastName
email
phone
company ( GSI PK )

The id is the parition key for the table and company is the PK for a gsi. I usually use the id to query individual user and use the company gsi to queryusers under a company with pagination.

Now I get a requirement to query users globally (without any company). I am not sure how I should support this since it doesn't have any PK in the query. And I don't want to use scan since it is too expensive.

One solution I can think of is to create a separate field which has a fixed value for all items. And create a GIS on this field. In this way, I can use the fixed value as PK to query all users. But it will create a hot partition in the table I want to avoid. Is there any other way to do that?

Upvotes: 0

Views: 113

Answers (1)

Maurice
Maurice

Reputation: 13187

One solution I can think of is to create a separate field which has a fixed value for all items. And create a GIS on this field. In this way, I can use the fixed value as PK to query all users. But it will create a hot partition in the table I want to avoid. Is there any other way to do that?

You're on the right track here, the risk of a hot partition is also spot-on. For the solution, we can make use of bucketing.

First, I understand that your access pattern looks something like this: getUserByUsername(username: str).

That means you know the username you're looking for. In order to solve the problem of a hot partition, you could calculate a separate partition key value (gsi2_pk) for the GSI based on the username, e.g. take the first two characters.

That means the table layout could be something like this:

gsi2_pk gsi2_sk
jo joey yi zhao
jo johnny b goode
ma maurice

This way you distribute your users across a lot more partitions.

The drawback here is that your usernames probably wouldn't be evenly distributed across the buckets and you may inadvertently create more hot partitions. Another approach would be to have a fixed number of buckets (n) and you put set gsi2_pk to hash(username) % n, which will more evenly distribute the items.

Upvotes: 0

Related Questions