Manasa C
Manasa C

Reputation: 5

Issue with date format in PIG

I am new to PIG and trying to analyse UberDataSet for 2 months to find out on which day more trips were booked.

Format:

B02617,2/27/2015,1551,14677

B02598,2/27/2015,1114,10755

B02512,2/27/2015,272,2056

B02764,2/27/2015,4253,38780

Pig Script1:

A = Load 'UberDataSet.txt' using PigStorage(',') as 
(base:chararray, tripdate:datetime, cars:int, tripkms:int);

DESCRIBE A;

DUMP A;

I am able to see that tripdate is of datetime type but I am getting only ,, in output but not dates.

Output:

(B02682,,1395,12693)

(B02617,,1473,12811)

(B02764,,3934,31957)

(B02598,,1134,10661)

(B02617,,1539,14461)

(B02682,,1465,13814)

(B02512,,243,1797)

Then I tried like this.

Pigscript2:

A = Load 'UberDataSet.txt' using PigStorage(',') as 
(base:chararray, tripdate:chararray, cars:int, tripkms:int);

B = FOREACH A GENERATE tripdate;

C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;

DESCRIBE C;

DUMP C;

Job Failed with an error message:

Job DAG: job_1495878748804_1697 2017-06-10 16:58:32,785 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-06-10 16:58:32,790 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias C. Backend error : org.apache.pig.b ackend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - sc ope-25 Operator Key: scope-25) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "date" Details at logfile: /home/manasa.testing_gmail/pig_1497109612992.log

There is some question related to this problem but could not get right solution or my problem. Loading datetime format files using PIG

I tried to change the date format to 'MM/dd/yyyy' also in

"C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;" keeping remaining script same... But I am getting same error saying about dateformat....

Can anyone help me to go further...

Thanks in advance....

Upvotes: 0

Views: 1843

Answers (1)

Anurag Yadav
Anurag Yadav

Reputation: 396

You have to use your second pig script as pig have issues to load datetime datatype from log.

Reason why it is not working :

The format of date in your dataset/log and the format you are passing with pig script is not the same. That's why you're getting this error

Format date in your log is 'MM/dd/yyyy'

C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;

While according to your script it should be 'yyyy-MM-dd'

Solution: You can simply copy paste below lines just by inserting log path in your system

A = Load '/tmp/a.log' using PigStorage(',') as (base:chararray, tripdate:chararray, cars:int, tripkms:int);

B = FOREACH A GENERATE tripdate;

C = FOREACH B GENERATE ToDate(tripdate,'MM/dd/yyyy') as mytripdate;

you will get output as

(2015-02-27T00:00:00.000+05:30)

(2015-02-27T00:00:00.000+05:30)

(2015-02-27T00:00:00.000+05:30)

(2015-02-27T00:00:00.000+05:30)

now if you want a further formatting in date you can use ToString() funcation on it.

D = FOREACH C GENERATE ToString(mytripdate,'yyyy-MM-dd') as mytripdate;

you will get output like this

(2015-02-27)

(2015-02-27)

(2015-02-27)

(2015-02-27)

Upvotes: 0

Related Questions