jnemecz
jnemecz

Reputation: 3608

Suitable storage method for huge amount of data

what kind of storage do you recommend for very huge amount of data? (≈ 50 milions records per day). Is this proper situation for systems like Hadoop or RDBMS is still sufficient for this purpose?

Upvotes: 0

Views: 139

Answers (1)

Olaf
Olaf

Reputation: 6289

With the amount of data you are describing, you might indeed be pushing into the Big Data terrirtory. Based on the amount of the details you provided, I would suggest loading raw data into Hadoop cluster, running map/reduce jobs to parse it and to load into date-based directories. You can then define an external Hive table partitioned by date (daily? weekly?) mapped to the results of your map/reduce jobs.

Next step would depend on the complexity of your reports and needed response time. If you can easily express them in SQL, you can just run queries on your Hive table. If they are more elaborated, you might have to write custom map/reduce jobs. Many suggest Pig for it, but I am personally more comforatble with the straight Java.

If you don't care about the response time of the reports, you can run them on-demand. If you care, but open to wait for the results for, say, tens of seconds or a few minutes, you can store report results also in Hive. If you want your reports to show up fast, say, in web-based or mobile UI, you might want to store the report data in a relational database.

Upvotes: 1

Related Questions