Jit B
Jit B

Reputation: 1246

hive unix_timestamp() UDF giving multiple values

I am using HQL to extract some data from a hive table, while adding an extra row containing the current time.

Something like: select col1, col2, col3, unix_timestamp() from myTable;

I was expecting that all the records will have the same value in the fourth column.

I was expecting something like:

col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT

However I am getting something like this:

col1Value, col2Value, col3Value, col4Value, timeT1
col1Value, col2Value, col3Value, col4Value, timeT1
col1Value, col2Value, col3Value, col4Value, timeT1
col1Value, col2Value, col3Value, col4Value, timeT2
col1Value, col2Value, col3Value, col4Value, timeT2
col1Value, col2Value, col3Value, col4Value, timeT2
col1Value, col2Value, col3Value, col4Value, timeT2
col1Value, col2Value, col3Value, col4Value, timeT3
col1Value, col2Value, col3Value, col4Value, timeT3

The dataset is not that large and only a single mapper is used. So my question is:

In a single machine, is unix_timestamp() evaluated for every row that is selected (each line in hive's mapper) or one value is evaluated and used for all the rows?

I am using MapR M5/hive 0.9.0

Upvotes: 0

Views: 643

Answers (1)

Lukas Vermeer
Lukas Vermeer

Reputation: 5940

According to the LanguageManual: "the context of a UDF's evaluate method is one row at a time". I believe this means your unix_timestamp() call would be evaluated during the mapping phase once for each record emitted.

Perhaps you could use a subquery to evaluate unix_timestamp() once and then join the result to your original query?

Upvotes: 1

Related Questions