MapReduce & Hive application Design

Question

I have a design question where in in my CDH 4.1.2(Cloudera) installation I have daily rolling log data dumped into the HDFS. I have some reports to calculate the success and failure rates per day.

I have two approaches

load the daily log data into Hive Tables and create a complex query.
Run a MapReduce job upfront everyday to generate the summary (which is essentially few lines) and keep appending to a common file which is a Hive Table. Later while running the report I could use a simple select query to fetch the summary.

I am trying to understand which would be a better approach among the two or if there is a better one.

The second approach adds some complexity in terms of merging files. If not merged I would have lots of very small files which seems to be a bad idea.

Your inputs are appreciated.

Thanks

MapReduce & Hive application Design

Answers (1)

Related Questions

MapReduce &amp; Hive application Design

Answers (1)

Related Questions

MapReduce & Hive application Design