Reputation: 97
There are 5 columns in each line and those 5 columns are commonly separated by comma
1 column is name
2nd column is date_of_purchase
3rd column is product
4th column is mode of payment
5th column is total_amount
Hope you understood what data it contains
surender,2014-03-09,TV,OFFLINE,20000
surender,2014-01-01,Mobile,ONLINE,18000
Raja,2014-09-21,Laptop,ONLINE,30000
Surender,2014-10-12,Laptop,ONLINE,40000
Raja,2014-FEB-11,MusicSystem,ONLINE,2000
Kumar,2014-07-09,Ipod,OFFLINE,4000
Kumar,2014-06-08,TV,ONLINE,20000
Raja,2014-11-07,SPeakers,OFFLINE,8000
Kumar,2014-10-18,Laptop,ONLINE,30000
What i need is i want to see how much amount each person has spent via online mode and offline mode
basically i need the reducer output should like below
surender OFFLINE 20000
surender ONLINE 58000
Raja OFFLINE 8000
Raja ONLINE 32000
Kumar OFFLINE 4000
Kumar ONLINE 50000
And the final output should be like this:
surender 20000 58000
Raja 8000 32000
Kumar 4000 50000
You can give me a hive or pig query or either a mapreduce program
Upvotes: 0
Views: 123
Reputation: 25929
A = LOAD 'file_name' using PigStorage(',') as (name:chararray,date:chararray,product:chararray,mode:chararray,total:long);
B = GROUP A BY (name,mode);
C = FOREACH B GENERATE group.name as name,group.mode, SUM(total) as total;
D = GROUP C BY name;
E = FOREACH D GENERATE group, C.total;
if your data like the sample you provided has different spellings then you need to convert to uppercase before grouping
Upvotes: 1