Reputation: 23
I have loaded data into Hadoop using Pig, but when I dump the csv table, it looks like my data was divided by one million. Original CSV:
state population
California 39144818
Texas 27469114
Florida 20271272
Pig code to load:
statePopFile =LOAD 'hdfs:/home/ubuntu/final/gunData/statePops.csv' using PigStorage(',');
stateRec = FOREACH statePopFile GENERATE $0 AS state ,$1 as population;
dump stateRec;
The output from the console looks like this
(California,"39)
(Texas,"27)
(Florida,"20)
Upvotes: 0
Views: 99
Reputation: 23
My problem was loading the data in and separating on ','. That was cutting of the number. This was resolved by separating on \t
Upvotes: 1