Reputation: 41690
I have an application which produces a large amount of data, that is all written once and then unchangeable (by law), and is rarely ever read. When it is read, it is always read in its entirety, as in, all the data for 2012 is read in one shot, and either processed for reporting or output in a different format for export (or gasp printed). The only way to access the data is to access an entire day's worth of data, or more than one day.
This data is easily represented as either two or three relational tables, or as a long list of self-contained documents.
What is the most storage-space-efficient way to store such data in a file system? Specifically, we're thinking of using Amazon S3 (File storage) for storage, though we could use something like RDS (their version of MySQL).
My current best bet is a gzipped file with JSON data for the entire day, one file per day.
Upvotes: 0
Views: 396
Reputation: 201537
Unless my data was pure ASCII (and even if it was), I would probably choose a binary storage method like one of
Upvotes: 1
Reputation: 27974
I would use Windows Azure's Table Storage because it allows for heterogenous structured data to be stored in a single table. Having a database-like storage will allow you to append data as needed. You can easily create new table for each year.
Upvotes: 0