luis
luis

Reputation: 2305

Is this the right approach to index creation in DynamoDB?

I want to create an app that has a list of clients with ids (emails in this case), their phone number and other important information. Most of the time the clients table will be searched by using the client id (their email) but occasionally I want to be able to do this search using their phone number. Basically I want the app to have a text field where you can either type the email or phone number and be able to retrieve the client data.

Client ( ID, PhoneNumber, Name, LastName, etc...)

After researching on DynamoDB, I came up with the solution of having a table for the clients and having an index hash key for the ids with a lot of throughput for read and write since querying based on this attribute will be the most common task. Then, I created a global secondary index, with a key for the attribute phoneNumber, and giving this a low throughput number for reading and writing since the search of a client by its phone number won't be too occasional. However, The app will never make an update using the phone number as a key, it will only make updates using the id as a key.

Is this the right approach, or would there be a better thing to do? Are the throughput values right based on my needs or do you think that there is no need to have any write throughput values for the secondary index? Is there maybe something wrong with thought process?

Thank you very much!

UPDATE: Let's say the clients table has a throughput read of 10 and a throughput write of 10 on the primary index. This means that I can read about 10 data packages (of 4KB each) per second by executing a GetItem operation where I have provide the primary key only. And I can write 10 data packages (of 1KB each) per second by executing something like PutItem or update where I provide the primary key as the only way of looking up the item to be updated. Is that right?

Now, for the GSI, if I put let's say a read throughput of 4 and a write throughput of 4 on the global secondary index, would that mean that: I can read a maximum (without throttling) of 4 packages (of 4KB each) per second by executing a GetItem operation where I only provide the secondary key as lookup method. And since I am not interested in writing down or updating any data based on the secondary key, because my app will only run update queries or putitems using its primary key, then I would be able to put a write throughput of 0 for the GSI?

or are these GSI writes affected by the changes on the primary key operations as well? I know that for sure the phone number attribute will almost never change after the item is created (unless if I change it myself using the dynamodb manager) or this will rarely happen so I could even make the write throughput 1.

Upvotes: 0

Views: 686

Answers (1)

Ben Schwartz
Ben Schwartz

Reputation: 1756

Yes, this is exactly how to use DynamoDB.

The base table schema can be a hash key with the email address and no range key.

You can provision a Global Secondary Index (GSI) with the phone number attribute as a hash key. The provisioning level for the GSI does not have to match the base table, though you should provision writes to account for the fact that every time the item is written to the base table, the item (or portion of the item) will also be written to the GSI. If the GSI is underprovisioned, it could lead to unexpected throttling on the base table.

Additionally, please be aware that this schema does not enforce phone number uniqueness. A GSI can have duplicate hash key values, so when you query the GSI you may get back multiple results.

Here is some general documentation on Global Secondary Indexes: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html

UPDATE: Every time you make a write to the base table, if the attributes that appear in the schema of the GSI are present in the item, it will be written to the GSI. The GSI is really just a second DynamoDB table maintained for you by the service - it requires the same write throughput as the base table if most/all items will contain phone numbers.

The only operations valid for a GSI are Query and Scan. To retrieve the item by phone number you can use the Query API with a condition for the hash key equal to the one you are looking for.

In terms of provisioning the GSI separately, the GSI can have Read Capacity completely independent of the base table. So for example:

Base table is provisioned with 10 RCU and 10 WCU

GSI is provisioned with 1 RCU and 10 WCU.

Upvotes: 2

Related Questions