simplfuzz
simplfuzz

Reputation: 12895

Dividing counts in Pig Script

ch = LOAD 'ch.txt';
ch_all = GROUP ch ALL;
ch_count = FOREACH ch_all GENERATE COUNT(ch);

ca = LOAD 'ca.txt';
ca_all = GROUP ca ALL;
ca_count = FOREACH ca_all GENERATE COUNT(ca);

I have the above pig script code, which computes two counts. Now I want to divide ch_count by ca_count and store it in a file. How do I do that?

Upvotes: 1

Views: 4409

Answers (1)

Romain
Romain

Reputation: 7082

There is no convenient way to do this in Pig but a JOIN could help you:

Pig:

ch = LOAD 'ch.txt';
ch_all = GROUP ch ALL;
ch_count = FOREACH ch_all GENERATE 'same' AS key, (DOUBLE) COUNT(ch) AS ct;

ca = LOAD 'ca.txt';
ca_all = GROUP ca ALL;
ca_count = FOREACH ca_all GENERATE 'same' AS key, (DOUBLE) COUNT(ca) AS ct;

ca_ch = JOIN ch_count BY key, ca_count BY key;

ca_ch_div = FOREACH ca_ch GENERATE ch_count::ct / ca_count::ct;

DUMP ca_ch_div;

Output:

(0.6666666666666666)

Input:

cat ch.txt 
1
2
cat ca.txt 
1
2
3

Upvotes: 2

Related Questions