Reputation: 1524
Our requirement is very simple, we want to store GPS location for sensors which should not be older than a couple of days. The total granularity of data would be max around a minute or so.
Since the total number of sensors could exceed a billion, the SimpleDB is not an option unless I write partitioning logic myself. SimpleDB though indexes each attribute, which makes it possible to run once in a while periodic cleanup scripts which deletes entries older than 2 days.
DynamoDB looks far better since it has no limit on amount of data, I can use partitioned+range primary key on sensorID+timestamp. However, the deletion of old data would require scan query, unless I also have a global secondary index on timestamp field. Usind this secondary global index, the query could potentially be quicker.
Is it just me who believes there could be a better way out there? Using DynamoDB/SimpleDB is better, since the entire deployment is in AWS environment, and we don't want to invest in ops much. I know other NOSQL DBs like Mongo DB supports those.
Upvotes: 0
Views: 1612
Reputation: 836
There is a new feature in DynamoDB added. Please check TTL
This will delete the item after the particular item's TTL is expired.
Upvotes: 2
Reputation: 2129
You can save entries in date based tables at x
day(s) increments.
GPS_LOCATIONS_09052016
GPS_LOCATIONS_09072016
...
Then you can drop old tables every x
day(s).
How many GPS locations are there per sensor? If you have for example 500 million unique sensors, then partitioning on sensor id isn't very efficient.
If date based tables don't work out for you, then you can create a GSI on a timestampHash
hash key and a timestamp
range key, where timestampHash
is a number between 1 to y
, y
depending on your data size. Then you can do a range query against this GSI for every timestampHash
and where timestamp
is less than now, or whatever you set your purge parameters. The timestampHash
will help you partition your data to help with throughput.
Upvotes: 1