Java MapReduce job to calculate the percentage

Question

Below is my Table (MyTable)

ID          TotalCount   ErrorCount   DT
----------------------------------------------
1345653         5           3       20120709
534140349       5           2       20120709
601806615       5           1       20120709
682527813       4           3       20120709
687612723       3           2       20120709
704318001       5           4       20120709
1345653         5           2       20120710
704318001       1           0       20120710
1120784094      3           2       20120711

So If I need to calculate the error percentage in Hive using HiveQL for specific date, then I will be doing like this-

SELECT 100 * sum(ErrorCount*1.0) / sum(TotalCount) FROM MyTable 
where dt = '20120709';

But I need to do the same thing using Java MapReduce. Is there any way we can do the same thing using MapReduce in Java code. First of all I am confused whenever we write any MapReduce job in Java we read the corresponding file for that date partition? or we read the table?

Update:- Below is the table name which will contain the above scenario

create table lipy
( buyer_id bigint,
  total_chkout bigint,
  total_errpds bigint
 )
 partitioned by (dt string)
row format delimited fields terminated by '	'
stored as sequencefile
location '/apps/hdmi-technology/lipy'
;

Thomas Jungblut · Accepted Answer

That is quite easy- let me give a shot at some pseudo code.

SELECT 100 * sum(ErrorCount*1.0) / sum(TotalCount) FROM MyTable 
where dt = '20120709';

Map Stage:

increment a counter for total counts (you can simply use a field)
check if the dt column is equal to 20120709
if yes, increment an error counter
in cleanup emit as Key/Value: -1 / totalcount and 0 / error counter

Reduce stage: (you get a totalcount for key -1 and the error counter as key 0)

add all the numbers from key -1 and key 0
in cleanup you can calculate your percentage, and maybe send a mail if that is possible

Several things to note:

Mapoutput is or if the count does not fit in a integer.
Set the number of reducers to 1, so a single reducer gets all the keys.

I believe this is everything to note, it is quite early here and I had no coffee, so if you find a problem, feel free to tell me ;)

Java MapReduce job to calculate the percentage

Answers (2)

Related Questions