Reputation: 24144
Below is my Table (MyTable)
ID TotalCount ErrorCount DT
----------------------------------------------
1345653 5 3 20120709
534140349 5 2 20120709
601806615 5 1 20120709
682527813 4 3 20120709
687612723 3 2 20120709
704318001 5 4 20120709
1345653 5 2 20120710
704318001 1 0 20120710
1120784094 3 2 20120711
So If I need to calculate the error percentage in Hive using HiveQL for specific date, then I will be doing like this-
SELECT 100 * sum(ErrorCount*1.0) / sum(TotalCount) FROM MyTable
where dt = '20120709';
But I need to do the same thing using Java MapReduce
. Is there any way we can do the same thing using MapReduce in Java code
. First of all I am confused whenever we write any MapReduce job in Java we read the corresponding file for that date partition? or we read the table?
Update:- Below is the table name which will contain the above scenario
create table lipy
( buyer_id bigint,
total_chkout bigint,
total_errpds bigint
)
partitioned by (dt string)
row format delimited fields terminated by '\t'
stored as sequencefile
location '/apps/hdmi-technology/lipy'
;
Upvotes: 0
Views: 1951
Reputation: 20969
That is quite easy- let me give a shot at some pseudo code.
SELECT 100 * sum(ErrorCount*1.0) / sum(TotalCount) FROM MyTable
where dt = '20120709';
Map Stage:
dt
column is equal to 20120709
Key/Value
: -1 / totalcount
and 0 / error counter
Reduce stage: (you get a totalcount for key -1 and the error counter as key 0)
Several things to note:
<IntWritable, IntWritable>
or <IntWritable,LongWritable>
if the count does not fit in a integer.I believe this is everything to note, it is quite early here and I had no coffee, so if you find a problem, feel free to tell me ;)
Upvotes: 1
Reputation: 4872
You can do this but the implementation will depend on:
How is data formatted - row format, delimited, ...
http://hive.apache.org/docs/r0.9.0/language_manual/data-manipulation-statements.html
How do you want to execute MapReduce. One very straightforward option is to run your Java MapReduce code as user defined functions (UDFs) that reuse HiveQL functions:
https://cwiki.apache.org/Hive/tutorial.html#Tutorial-Custommap%252Freducescripts
or simply run your custom mapreduce over Hive table data in HDFS.
Upvotes: 0