jay doank
jay doank

Reputation: 91

How to sum 2 log files in pig

I have problem, sum 2 log files.

example files:

  1. file-1

    id user view

    1 AAA 2

    2 BBB 5

    3 CCC 9

  2. file-2

    id user view address

    1 AAA 5 XXX

    2 BBB 2 YYY

    6 FFF 4 ZZZ

i want sum two file by id and sum (view), i hope output:

output:

id user view address
1  AAA  7    XXX
2  BBB  7    YYY

i should try code join two files, but i don't sum two files:

My code:

inputdata = LOAD '/user/hdfs/tes/part-1' AS (
    id:chararray, 
    user:chararray, 
    view:int
);


inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
    id:chararray, 
    user:chararray, 
    view:int,
    address:chararray
);


joined = JOIN inputdata BY id LEFT OUTER, inputdata2 by id;

outputlist = FOREACH joined {

        GENERATE
        inputdata::id, 
        inputdata::user, 
        --sum(inputdata2::view), 
        inputdata2::address;


}

dump outputlist;

iam question, how to sum view in two log files.??

Thanks.

Upvotes: 0

Views: 64

Answers (1)

Vignesh I
Vignesh I

Reputation: 2221

Get the join result n a foreach loop and sum up the view values.This will work.

A = LOAD 'file1.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int);                  
B = LOAD 'file2.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int,d:chararray);      
C = JOIN A by a,B by a;                                                                                                                           
D = FOREACH C GENERATE A::a as id,A::b as user,A::c + B::c as view,B::d as address;

Output:

(1,AAA,7,XXX)
(2,BBB,7,YYY)

Upvotes: 2

Related Questions