anusngh
anusngh

Reputation: 97

SQL-HIVE-PIG -Mapreduce

There are 5 columns in each line and those 5 columns are commonly separated by comma

1 column is name
2nd column is date_of_purchase
3rd column is product
4th column is mode of payment
5th column is total_amount

Hope you understood what data it contains

surender,2014-03-09,TV,OFFLINE,20000
surender,2014-01-01,Mobile,ONLINE,18000
Raja,2014-09-21,Laptop,ONLINE,30000
Surender,2014-10-12,Laptop,ONLINE,40000
Raja,2014-FEB-11,MusicSystem,ONLINE,2000
Kumar,2014-07-09,Ipod,OFFLINE,4000
Kumar,2014-06-08,TV,ONLINE,20000
Raja,2014-11-07,SPeakers,OFFLINE,8000
Kumar,2014-10-18,Laptop,ONLINE,30000

What i need is i want to see how much amount each person has spent via online mode and offline mode

basically i need the reducer output should like below

surender   OFFLINE   20000
surender   ONLINE    58000
Raja       OFFLINE   8000
Raja       ONLINE    32000
Kumar      OFFLINE    4000
Kumar      ONLINE    50000

And the final output should be like this:

surender 20000  58000
Raja     8000   32000
Kumar     4000   50000 

You can give me a hive or pig query or either a mapreduce program

Upvotes: 0

Views: 123

Answers (1)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25929

A = LOAD 'file_name' using PigStorage(',') as (name:chararray,date:chararray,product:chararray,mode:chararray,total:long);
B = GROUP A BY (name,mode);
C = FOREACH B GENERATE group.name as name,group.mode, SUM(total) as total;
D = GROUP C BY name;
E = FOREACH D GENERATE group, C.total;

if your data like the sample you provided has different spellings then you need to convert to uppercase before grouping

Upvotes: 1

Related Questions