Reputation: 15770

DynamoDB Table Structure for object

Folks, What would you suggest the DynamoDB Table structure be for the following Object? There will be roughly 2 million objects, which will need to be searchable by email and/or organization.

{
  email: '[email protected]',
  organization: 'foobar'
}

What would you make the Hash/Range Keys be? I need to be able to perform the followign operations:

Retrieve all emails for specific organization
Delete specific email

Should i add a random id parameter to the Table? I would imagine the following is the correct way:

organization being the Hash Key, email being the Range Key.

Thanks

Upvotes: 1

Answers (3)

Erben Mo

Reputation: 3634

CIn your base table use email as hashkey as it is more random than Department so it can be parititioned well.

Create a GSI with Organization as hashkey.

1) Retrieve all emails for specific organization

query your GSI with hashkey equals to the target org.

2) Delete specific email

easily done because email is the hashkey of your base table.

A low provisioned throughput will still work. the only effect is that your scan will take longer time. If your Read Provisioned throughput read is 10, then your scan will take about:

21000 / 10 = 2100 seconds.

I think for Scan operation you can set a limit for how many items it should return. The result will also include a lastEvaluatedKey which you can provide in your Scanning call for the next page.

Upvotes: 0

Cmag

Reputation: 15770

The problem is the provisioned capacity, and Scan operations. If you have 1 million records, 85 bytes each, that amounts to 86000 KB, which would require 21,000 provisioned reads!

At this point, to keep the costs down, I see no other alternative than to have the following structure:

| Hash Key | Range Key    | Secondary Range Key |
| 1        | organization | email               |

in other words:

| Hash Key | Range Key    | Secondary Range Key |
| 1        | foo          | [email protected]        |
| 1        | bar          | [email protected]        |
| 1        | foo          | [email protected]        |

This means you always know your HashKey. And using it, you can do queries on specific RangeKeys.

Thoughts?

Upvotes: 0

rpmartz

Reputation: 3809

It seems that either of those would distribute your objects well as hash keys, so I don't know that either of them is necessarily a better hash key per se. I think that the fact that you'll need to retrieve all of the specific emails for an organization makes that the better candidate for a hash key, though. You can just do a query using the organization to get all of an organization's emails.

Note that in order to support the use cases you described, you'll need a global secondary index. This answer may be helpful in showing why, but assuming that you went with Organization as the table hash key, you'd need a global secondary index on email to retrieve a specific email (or retrieve that item to delete it).

Upvotes: 1

DynamoDB Table Structure for object

Answers (3)

Related Questions