Reputation: 105
I'm trying to parse a Date in a Pig script and i got the following error "Hadoop does not return any error message".
Here is the Date format example : 3/9/16 2:50 PM
And here is how I parse it :
data = LOAD 'cleaned.txt'
AS (Date, Block, Primary_Type, Description, Location_Description, Arrest, Domestic, District, Year);
times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;
You can see the data file here
Do you have any idea ? Thanks
EDIT:
It look like the error is caused by the STORE command on "times".
If I do a DUMP then I got:
ERROR 1066: Unable to open iterator for alias times
It happen only when I use the ToDate function, I have other scripts that work perfectly.
Upvotes: 0
Views: 721
Reputation: 2485
First of all, you need to specify the loader in the LOAD statement:
USING PigStorage('\t')
I assumed that you're using tab separator. Than if you have no schema specify the schema with type!
So you're load statement will be sg like this:
data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);
For now I just use chararray type for everything, but you have to specify the type what is the right representation for you.
After this the date conversion just works fine as you wrote: (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z)
My test script:
data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);
times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;
DUMP times;
UPDATE: Some explanation
By the way the default loader is pig storage
PigStorage is the default load function for the LOAD operator.
but it's nicer to specify. The original issue caused by the lack of datatype
If you don't assign types, fields default to type bytearray
so the ToDate failed on the input type.
Upvotes: 2