Reputation: 1853
I'd like to use AWS DynamoDB as a datastore for a data-collection application, where the data schema may vary over time.
For example, initially an Item may represent attributes of people e.g. {name, age}. However, later the schema may be modified to contain {name, age, gender}.
Each schema modification will be tracked and versioned and older data won't need to be migrated - but it may still need to be queried alongside newer data.
Is it an acceptable pattern to store each data-schema change in its own table? Is there a straightforward mechanism to query aggregated data across tables?
Upvotes: 2
Views: 3485
Reputation: 5944
Schemas for DynamoDB tables are dynamic in nature. The only thing that needs to be set up upfront is the key name and type. You can add global indexes any time too (indexes with a different partition key). Local indexes, however, those with the same partition key but different sort key, they are added at table creation table. Because of this dynamic schema, you can add new fields, or stop adding them any time.
You need to design tables knowing how would you query them. Queries are quite restricted, you can filter but that's not a fast/cheap operation. Fast queries rely on existing indexes. Queries can fetch from a single table. Joins/unions aren't available.
A table scan is done without any criteria, only filters are available. With filters, data is fetched from disk but can be removed from the returned set. It's an expensive operation in both cost and time. Queries passing a key are faster because they fetch data from a single partition. So you might want to design a key with both a partition (userId for instance) and sort key (item id). It is usual to have compound keys on DynamoDB.
Also it is important to avoid hot spots inside a table. That is, data needs to be fairly distributed inside partition keys.
Reference: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html
Upvotes: 3