Edamame
Edamame

Reputation: 25376

Impala built-in function: unix_timestamp function is not accurate enough

I am using the following impala query to extract data from my_table:

select id, timestamp, unix_timestamp(timestamp) as ts from my_table

The output is:

        id          timestamp               ts
-------------------------------------------------------
    0   A       2016-08-21 16:18:34.336     1471796314
    1   A       2016-08-21 16:18:34.517     1471796314
    2   A       2016-08-21 16:18:34.677     1471796314
    3   B       2016-08-14 00:20:00.641     1471134000
    4   B       2016-08-14 00:20:00.697     1471134000
    5   B       2016-08-14 00:20:00.720     1471134000
    6   B       2016-08-14 00:20:00.750     1471134000

The converted ts field becomes all the same because the original timestamp difference is very small. Is there a way (a function) which can make ts using higher digits (accuracy)? I need every ts to be different if the original timestamp is different. Thanks!

Upvotes: 0

Views: 504

Answers (1)

Samson Scharfrichter
Samson Scharfrichter

Reputation: 9067

With a recent version of Impala [caveat below], you should be able to extract the milliseconds and reconstruct a more precise timestamp.
Caveat - the Cloudera documentation states that the millisecond() function is available in CDH 5.7.0 which is clearly not the case.

SELECT Unix_Timestamp(timestamp)*1000 +Millisecond(timestamp) AS ts FROM wtf

By the way, be very careful about Time Zones. Impala assumes that all timestamps are provided as UTC, so if your data comes from multiple Data Centers -- or from a Data Center that applies DST -- your precision loss is a matter of hours, not of milliseconds...

Upvotes: 1

Related Questions