Reputation: 99
I'm trying to analyze some data with apache pig in amabari
In this specific case I have a problem... I have a column with the timestamp written in this specific format
1433306146000
1422814565122
with the online converter if I put in input 1433306146000 the output is 08/30/47389 @ 12:06pm (UTC) but this is not true, in fact if I manually delete the last 3 numbers (1433306146) the result is 06/03/2015 @ 4:35am (UTC).
So, ther's a method in pig to delete this last 3 numbers in timestamp column? And also the main question is : with the function
GetDay(datetime)
if I pass the timestamp it will work? or there's a different method to produce, from the timestamp, the associate week day?
Thanks
Upvotes: 1
Views: 6917
Reputation: 826
Not sure which converter you have used for your timestamps but here it shows propper format. ToDate operator does not act on the complex data type such as bag. That is why you get an error trying to use it on group. Also sometimes behaviour with chararrays is unpredictable thus rule of thumb is to use long primitive type for timestamps. The code below should work.
input=LOAD '..Fra/data/prova/*' AS(old_timestamp:long,timestamp:long);
result=FOREACH input GENERATE
GetDay(ToDate(old_timestamp)),
GetDay(ToDate(timestamp));
DUMP result;
Also you can also obviously cast from chararray to long in your case if you do not want to change the schema. Only this will change:
result=FOREACH input GENERATE
GetDay(ToDate((long)old_timestamp)),
GetDay(ToDate((long)timestamp));
Also be careful that ToDate will ALWAYS needs a milliseconds so if you will cut them off you would expect to see Unix default start time of 1970-01-01 Cheers!
Upvotes: 2
Reputation: 99
@inquisitive_mind I tried the full string (including the offset) but it not work, the error is:
"invalid format 1422762920000 is malformed at 0000"
so i tried to use substring in this way
SUBSTRING grp = FOREACH y GENERATE (timestamp),SUBSTRING(timestamp,0,9);
and then I stored it into a new file for manipulate it in freedom.The output produced was two colum: in the first it was the original timestamp, in the second it was the new timestamp without the last three chars, So I tried to launch this new script:
y=LOAD '..Fra/data/prova/*' as (old_timestamp:chararray,timestamp:chararray);
grp=group y by timestamp;
result=foreach grp generate GetDay(ToDate(group));
dump result;
the output is:
"...invalid format 1422762920 malformed at 0"
So even if I knew that while deleting another char from the previus code I would try so I changed the range of the substring function (timestamp,0,10). And now i tried to relunch the script but the output produced is (1),(1),(1),(1),(1), I'm desperete!
Upvotes: 0
Reputation: 11080
What you have is the offset from 1970-01-01T00:00:00.000Z in terms of milliseconds.You can pass that milliseconds to ToDate .It takes Unix timestamp as input and returns datetime object. Once you get the datetime object, you can then use the GetDay function.No need to remove last 3 numbers
GetDay(ToDate(1433306146000))
Note:If you really want to remove 3 numbers then load the data as chararray and use SUBSTRING function.
Upvotes: 0