Reputation: 2158
I am running multiple plugins as cron jobs everyday to gather some data from different data sources. I am planning to store these data in 2 places: in Amazon DynamoDB and in Amazon S3. The metadata of the results will be stored in DynamoDB. This will also hold the S3 bucket name where the data will store. My question is how do I group these daily results in S3? I am thinking of couple of ways:
(1)Lets say for plugin1 everyday I run I will store it in different buckets where the bucket name will be -. The merit of this approach is easy to retrieve the data for each day but the demerit is we have now 365 buckets for just one plugin. So if I have n plugins I will have 365 times n buckets over a year. We could delete buckets after some time interval to reduce the number of buckets (say 3 months)
(2) I could also use one bucket per plugin, and use a guid as a prefix for my keys. Like guid/result_n where result_n is the nth result that I get for that plugin. I would also add a key let's call it plugin_runs that would hold a list of dictionaries, where each dictionary would have this format {date: execution_id}. Then I could for a given date, find the prefix for the execution_id for that date and retrieve the contents of those keys.
Which approach would be better? Any other suggestions?
Upvotes: 0
Views: 657
Reputation: 46869
Given that AWS will only allow you to create 100 buckets per account, then I would say #2 is a much better approach.
But you really only need a single bucket, with a key prefix on each object to organize them. Here, for example, is how AWS Kinesis Firehose creates objects for you, and the naming convention they use. If it works for them, it should work for you:
Amazon S3 Object Name Format
Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH before putting objects to Amazon S3. The prefix translates into the Amazon S3 folder structure, where each label separated by a forward slash (/) becomes a sub-folder. You can modify this folder structure by adding your own top-level folder with a forward slash (for example, myApp/YYYY/MM/DD/HH) or prepending text to the YYYY top-level folder name (for example, myApp YYYY/MM/DD/HH). This is accomplished by specifying an S3 Prefix when creating the delivery stream, either by using the Firehose console or the Firehose API.
http://docs.aws.amazon.com/firehose/latest/dev/basic-deliver.html
Upvotes: 1