Cmag
Cmag

Reputation: 15750

DynamoDB Table Structure for object

Folks, What would you suggest the DynamoDB Table structure be for the following Object? There will be roughly 2 million objects, which will need to be searchable by email and/or organization.

{
  email: '[email protected]',
  organization: 'foobar'
}

What would you make the Hash/Range Keys be? I need to be able to perform the followign operations:

Should i add a random id parameter to the Table? I would imagine the following is the correct way:

Thanks

Upvotes: 1

Views: 720

Answers (3)

Erben Mo
Erben Mo

Reputation: 3614

CIn your base table use email as hashkey as it is more random than Department so it can be parititioned well.

Create a GSI with Organization as hashkey.

1) Retrieve all emails for specific organization

query your GSI with hashkey equals to the target org.

2) Delete specific email

easily done because email is the hashkey of your base table.

A low provisioned throughput will still work. the only effect is that your scan will take longer time. If your Read Provisioned throughput read is 10, then your scan will take about:

21000 / 10 = 2100 seconds.

I think for Scan operation you can set a limit for how many items it should return. The result will also include a lastEvaluatedKey which you can provide in your Scanning call for the next page.

Upvotes: 0

Cmag
Cmag

Reputation: 15750

The problem is the provisioned capacity, and Scan operations. If you have 1 million records, 85 bytes each, that amounts to 86000 KB, which would require 21,000 provisioned reads!

At this point, to keep the costs down, I see no other alternative than to have the following structure:

| Hash Key | Range Key    | Secondary Range Key |
| 1        | organization | email               |

in other words:

| Hash Key | Range Key    | Secondary Range Key |
| 1        | foo          | [email protected]        |
| 1        | bar          | [email protected]        |
| 1        | foo          | [email protected]        |

This means you always know your HashKey. And using it, you can do queries on specific RangeKeys.

Thoughts?

Upvotes: 0

rpmartz
rpmartz

Reputation: 3809

It seems that either of those would distribute your objects well as hash keys, so I don't know that either of them is necessarily a better hash key per se. I think that the fact that you'll need to retrieve all of the specific emails for an organization makes that the better candidate for a hash key, though. You can just do a query using the organization to get all of an organization's emails.

Note that in order to support the use cases you described, you'll need a global secondary index. This answer may be helpful in showing why, but assuming that you went with Organization as the table hash key, you'd need a global secondary index on email to retrieve a specific email (or retrieve that item to delete it).

Upvotes: 1

Related Questions