Reputation: 11
I have a table user table with Schema
|Column 1 | USER ID |int|
|Column 2 |EMAIL|chararray|
|Column 3 |LANGUAGE |chararray|
|Column 4 |LOCATION |chararray|
and a transactional table with schema
|Column 1 | ID |int|
|Column 2 |PRODUCT|int|
|Column 3 |USER ID |int|
|Column 4 |PURCHASE AMOUNT |double|
|Coulmn 5 |DESCRIPTION |chararray|
Ques...Find out the count of each product in distinctive Locations.
I have written a pig script as following :-
user = LOAD '/tmp/users.txt' USING PigStorage (' ')
AS (USER_ID:int, EMAIL:chararray, LANGUAGE:chararray, LOCATION:chararray);
transaction = LOAD '/tmp/transaction.txt' USING PigStorage (' ')
AS (ID:int, PRODUCT:int,USER_ID:int, PURCHASE_AMOUNT:double,DESCRIPTION:chararray);
u1 = JOIN user by USER_ID, transaction by USER_ID;
u2 = GROUP u1 by LOCATION;
Result = FOREACH u2 GENERATE COUNT(u2.PRODUCT);
DUMP Result;
Error---ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: Invalid scalar projection: u2
This is what i am getting.
Upvotes: 0
Views: 61
Reputation: 8396
In your script Result = FOREACH u2 GENERATE COUNT(u2.PRODUCT);
is wrong. After the group by operation, data structure changes. You can find that with describe u2
. Please try with following (I'm assuming the fields are comma separated in the txt file):
user = LOAD 'user.txt' USING PigStorage (',') AS (USER_ID:int, EMAIL:chararray, LANGUAGE:chararray, LOCATION:chararray);
transaction = LOAD 'transaction.txt' USING PigStorage (',') AS (ID:int, PRODUCT:int,USER_ID:int, PURCHASE_AMOUNT:double,DESCRIPTION:chararray);
u1 = JOIN user by USER_ID, transaction by USER_ID;
u2 = GROUP u1 by (LOCATION,PRODUCT);
Result = FOREACH u2 GENERATE FLATTEN(group) as (LOCATION,PRODUCT), COUNT($1);
DUMP Result;
Upvotes: 1