msm
msm

Reputation: 1

PIG Programming

How to write a PIG Query to get the count of presence of values in a field?

e.g:

Field A | Field B

20|ABC;

21|XYZ;

25|null;

99|WER;

45|null;

89|FOY;


Required O/P : Count of Field A = 6, Count of Field B = 4

Upvotes: 0

Views: 521

Answers (4)

Sivasakthi Jayaraman
Sivasakthi Jayaraman

Reputation: 4724

Pig doesn't treat the above input as null its basically a chararray, so all the built-in functions like(is null, is not null) will not work in this case. You need to group all the fields, filter out the null values and get the count. Can you try the below script?

input

20|ABC;
21|XYZ;
25|null;
99|WER;
45|null;
89|FOY;

PigScript:

A = LOAD 'input' USING PigStorage('|') AS (f1:int,f2:chararray);
B = GROUP A ALL;
C = FOREACH B {
                filterNull =  FILTER A BY (f2!='null;');
                GENERATE COUNT(A.f1) AS fieldA, COUNT(filterNull.f2) AS fieldB;
              }
DUMP C;

Output:

(6,4)

Upvotes: 8

Manindar
Manindar

Reputation: 998

Input:

20|ABC
21|XYZ
25|null
99|WER
45|null
89|FOY

Script:

inputData = LOAD 'input' using PigStorage('|');
grouped_input = GROUP inputData ALL;
counts = FOREACH grouped_input GENERATE COUNT($1), COUNT($2);
dump counts;

Upvotes: 0

harit singh
harit singh

Reputation: 1

please find the answer :- my sample data is

003 Amit Delhi India 12000 

004 Anil Delhi India 15000

005 Deepak Delhi India 34000

006 Fahed Agra India 45000

007 Ravi Patna India 98777

008 Avinash Punjab India 120000

009 Saajan Punjab India 54000

001 Harit Delhi India 20000

002 Hardy Agra India 20000

011  Banglore

its all separated by space

Code is as below :-

A = load '/edata' using PigStorage(' ') as (eid:int,name:chararray,city:chararray,country:chararray,salary:int);

s = group A ALL ;

result = foreach s generate COUNT(A.eid),COUNT(A.name),COUNT(A.country),COUNT(A.salary);

dump result ;

You will get following result :-

(10,9,9,9)

Upvotes: 0

Subba
Subba

Reputation: 11

Please find steps to follow to get the output

fieldcount =  load '/user/examples/stackoverflow/count.txt' using PigStorage('|') as (a:int, b:chararray); 

fieldcount1 = FOREACH fieldcount GENERATE a, REPLACE(b,';','') as b;

fieldcount2 = GROUP fieldcount1 ALL;

fieldcount3 = FOREACH fieldcount2 {
    a_cnt = FILTER fieldcount1 BY a is not null;
    b_cnt = FILTER fieldcount1 BY b is not null and b != 'null' ;
    GENERATE COUNT(a_cnt) as a_count, COUNT(b_cnt) as b_count;
}

Upvotes: 0

Related Questions