Reputation: 303
Assuming that field time
looks like 2013-01-01T00:00:00.000Z
, piggybank.jar
has been imported already , and command EXTRACT
has been defined (DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();) What's the best way to extract fields year, month, day, hour, minute, second
? That's what I have done so far:
data = FOREACH data GENERATE FLATTEN(EXTRACT(time, '(\\d+)-(\\d+)-(\\d+)T(\\d+):(\\d+):(\\d+).(\\s+)'))
AS (
year: int,
month: int,
day: int,
hour: int,
minute: int,
second: int,
tail: chararray
);
Upvotes: 2
Views: 1872
Reputation: 3284
Since Pig 0.11 you can use the DateTime type.
A = LOAD 'data' AS (date:chararray);
B = FOREACH A GENERATE ToDate(date) AS date;
C = FOREACH B GENERATE GetMonth(date) as month;
You can use these functions here: DateTime functions
If you're not working with 0.11 you can write a UDF or resort to the regex you posted.
Upvotes: 4