user2295350
user2295350

Reputation: 303

Handling dates with Regex in Apache Pig

Assuming that field time looks like 2013-01-01T00:00:00.000Z , piggybank.jar has been imported already , and command EXTRACT has been defined (DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();) What's the best way to extract fields year, month, day, hour, minute, second ? That's what I have done so far:

data = FOREACH data GENERATE FLATTEN(EXTRACT(time, '(\\d+)-(\\d+)-(\\d+)T(\\d+):(\\d+):(\\d+).(\\s+)'))
        AS (
            year: int,
            month: int,
            day: int,
            hour: int,
            minute: int,
            second: int,
            tail: chararray
        );

Upvotes: 2

Views: 1872

Answers (1)

Frederic
Frederic

Reputation: 3284

Since Pig 0.11 you can use the DateTime type.

A = LOAD 'data' AS (date:chararray);
B = FOREACH A GENERATE ToDate(date) AS date;
C = FOREACH B GENERATE GetMonth(date) as month;

You can use these functions here: DateTime functions

If you're not working with 0.11 you can write a UDF or resort to the regex you posted.

Upvotes: 4

Related Questions