JasonA
JasonA

Reputation: 314

Could not infer COUNT function

I'm trying to write a pig latin script to pull the count of a dataset that I've filtered.

Here's the script so far:

/* scans by title */

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
scancount       = FOREACH productscans GENERATE COUNT($0);
DUMP scancount;

For some reason, I get the error:

Could not infer the matching function for org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an explicit cast.

What am I doing wrong here? I'm assuming it has something to do with the type of the field I'm passing in, but I can't seem to resolve this.

TIA, Jason

Upvotes: 9

Views: 13956

Answers (3)

Sanjiv
Sanjiv

Reputation: 1815

COUNT requires a preceding GROUP ALL statement for global counts and a GROUP BY statement for group counts.

You can use any of below :

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT(productscans);
DUMP scancount;

Or

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT($1);
DUMP scancount;

Upvotes: 7

Chris White
Chris White

Reputation: 30089

Is this what you're looking for (group by all to bring everything into one bag, then count the items):

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT(productscans);
dump count;

Upvotes: 16

whoisjake
whoisjake

Reputation: 622

Maybe

/* scans by title */

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
scancount       = FOREACH productscans GENERATE COUNT(productscans);
DUMP scancount;

Upvotes: 0

Related Questions