karlo kilayko
karlo kilayko

Reputation: 60

Iterating over all items in SimpleDB

Let's say I have a AWS SimpleDB domain with around 3 million items, each item has an attribute of "foo" with a value of some arbitrary integer (which is of course actually stored in SimpleDB as a string, but let's ignore the conversion to and from for now). I would like to increment the foo value for each item every 60 seconds, until it reaches a maximum value (max value is not the same for each item, item's max is stored as another attribute-value in item), then reset foo to zero: read, increment, evaluate, store.

Given the large number of items, and the hard 60 second time limit, is this approach feasible in SimpleDB? Anyone have an approach to make this work?

Upvotes: 2

Views: 719

Answers (2)

Jeremy Wadhams
Jeremy Wadhams

Reputation: 1812

Why not generate the value at read time from a trusted clock? I'm going to make up some names:

  • Touch_time - Epoch value (seconds since 1970) when the item was initialized to zero.
  • Max_age - Number of minutes when time wraps around.
  • Current_time - Epoch value of now.

So at any time, you can get the value you were proposing to store in an attribute by

(current_time - touch_time) % (max_age * 60)

Assuming max_age changes relatively infrequently, and everyone trusts touch_time and current_time to within a minute, and that's what NTP is for.

Upvotes: 0

Mocky
Mocky

Reputation: 7888

You can do it, but it is not feasible. You can only get between 100-300 PUTs per second for a single domain. You can read upwards of 1000 items per second so writes will be the bottleneck.

To be on the conservative side lets say 100 store operations per second, per domain. You'd need 500 domains to open up enough throughput to store all 3 million each minute. You only get 100 by default, so you'd have to ask for more.

Also it would be expensive. Writes with a small number of attributes are about $3 per million and reads are about $1.30 per million. That's about $13 / minute.

The only thing I can really suggest would be if there was a way to combine the 3 million items into a smaller number of items. If there were a way to put 50 "items" into each real item, you could do it with 10 domains at about $15.50 / hour. But I still wouldn't call that feasible, since you can get a cluster of 10 Extra Large High-CPU EC2 server instances for $6.80 / hour.

Upvotes: 1

Related Questions