Reputation: 1281
I have the following TimeStamp value: Wed Jun 25 09:18:15 +0000 2014
.
I am writing a MapReduce program in Python that reads JSON objects from an Amazon S3 location and export it to a local CSV file. The CSV file will then export data to a MySQL and HBase database. I have about 200 million records (1 TB), so I need to optimize every processing step.
What data type should I use to store the TimeStamp value in Python, CSV, MySQL and HBase database? I need to store all aspects of the TimeStamp value. My schema has 4 columns in the CSV file, MySQL and HBase database tables.
Thanks!
Upvotes: 1
Views: 584
Reputation: 23339
Use long to represent time (milli seconds), so you don't bother about the date formatting/string encoding
. It's space efficient and much easier to perform range queries.
Upvotes: 2