Reputation: 5
I am new to PIG and trying to analyse UberDataSet for 2 months to find out on which day more trips were booked.
Format:
B02617,2/27/2015,1551,14677
B02598,2/27/2015,1114,10755
B02512,2/27/2015,272,2056
B02764,2/27/2015,4253,38780
Pig Script1:
A = Load 'UberDataSet.txt' using PigStorage(',') as
(base:chararray, tripdate:datetime, cars:int, tripkms:int);
DESCRIBE A;
DUMP A;
I am able to see that tripdate is of datetime type but I am getting only ,, in output but not dates.
Output:
(B02682,,1395,12693)
(B02617,,1473,12811)
(B02764,,3934,31957)
(B02598,,1134,10661)
(B02617,,1539,14461)
(B02682,,1465,13814)
(B02512,,243,1797)
Then I tried like this.
Pigscript2:
A = Load 'UberDataSet.txt' using PigStorage(',') as
(base:chararray, tripdate:chararray, cars:int, tripkms:int);
B = FOREACH A GENERATE tripdate;
C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;
DESCRIBE C;
DUMP C;
Job Failed with an error message:
Job DAG: job_1495878748804_1697 2017-06-10 16:58:32,785 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-06-10 16:58:32,790 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias C. Backend error : org.apache.pig.b ackend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - sc ope-25 Operator Key: scope-25) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "date" Details at logfile: /home/manasa.testing_gmail/pig_1497109612992.log
There is some question related to this problem but could not get right solution or my problem. Loading datetime format files using PIG
I tried to change the date format to 'MM/dd/yyyy' also in
"C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;" keeping remaining script same... But I am getting same error saying about dateformat....
Can anyone help me to go further...
Thanks in advance....
Upvotes: 0
Views: 1843
Reputation: 396
You have to use your second pig script as pig have issues to load datetime datatype from log.
Reason why it is not working :
The format of date in your dataset/log and the format you are passing with pig script is not the same. That's why you're getting this error
Format date in your log is 'MM/dd/yyyy'
C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;
While according to your script it should be 'yyyy-MM-dd'
Solution: You can simply copy paste below lines just by inserting log path in your system
A = Load '/tmp/a.log' using PigStorage(',') as (base:chararray, tripdate:chararray, cars:int, tripkms:int);
B = FOREACH A GENERATE tripdate;
C = FOREACH B GENERATE ToDate(tripdate,'MM/dd/yyyy') as mytripdate;
you will get output as
(2015-02-27T00:00:00.000+05:30)
(2015-02-27T00:00:00.000+05:30)
(2015-02-27T00:00:00.000+05:30)
(2015-02-27T00:00:00.000+05:30)
now if you want a further formatting in date you can use ToString() funcation on it.
D = FOREACH C GENERATE ToString(mytripdate,'yyyy-MM-dd') as mytripdate;
you will get output like this
(2015-02-27)
(2015-02-27)
(2015-02-27)
(2015-02-27)
Upvotes: 0