ben890
ben890

Reputation: 1133

Change datatype of column in piglatin

I have a relation in pig latin. There are a ton of columns so I don't want to specify the data type when I load the relation. Is there a way to change it after the fact?

batters = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' using PigStorage(',');
filtered_batters = FOREACH batters2 GENERATE $0 as id, $5 as bats;
describe filtered_batters;
filtered_batters: {id: bytearray, bats: bytearray}

The reason I'm asking is because I'm trying to group by id and sum the bats column and I'm getting an error. My thinking is that the data type is not suitable for summing. Right now it's a bytearray and I think it needs to be an int for me to sum it Please let me know if this is correct and if so, how to do the above.

Thanks

Upvotes: 0

Views: 2022

Answers (1)

nobody
nobody

Reputation: 11080

See CAST Operators.If you do not specify the datatype in the LOAD statement Pig uses the default bytearray as the datatype for the fields.

filtered_batters = FOREACH batters2 GENERATE (int)$0 as id, (int)$5 as bats;

OR

filtered_batters = FOREACH batters2 GENERATE $0 as id:int, $5 as bats:int;

Upvotes: 4

Related Questions