Stanleyrr
Stanleyrr

Reputation: 875

HIVE rendered timestamp column data as NULL

I am trying to create an external table using Hive. Below is the Hive query I ran:

create external table trips_raw
(
VendorID int,
tpep_pickup_datetime timestamp,
tpep_dropoff_datetime timestamp
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','  location '/user/taxi_trips/';

When I looked at the output from the 'trips_raw' table created by the query above, I saw that both the 'tpep_pickup_date_time' and 'tpep_dropoff_datetime' columns are 'NULL' in all rows. I have seen other threads talked about the reason being that the '1/1/2018 11:13:00 AM' timestamp format is not accepted by Hive, but problem is that's the timestamp format I have in my csv source data (as you can see from screenshot here).

enter image description here

I could specify those 2 timestamp columns as 'string' and Hive will be able to render them correctly, but I still would want those 2 columns to be 'timestamp' type so specifying those 2 columns as 'string' type is not a viable option here.

I had also tried the following technique using recommendation from this site (https://community.hortonworks.com/questions/55266/hive-date-time-problem.html) but had no success:

  1. Create the 'trips_raw' table using 'string' as type for the 2 timestamp columns. This allows the resulting table to render the timestamps correctly, albeit in 'string' type. The Hive command I used is shown below:

    create external table trips_raw
    (
    VendorID int,
    tpep_pickup_datetime string,
    tpep_dropoff_datetime string
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','  location 
    '/user/taxi_trips/';
    

When I look at the resulting table, the dates are shown as string as you can see from this screenshot below.

enter image description here

But as I had mentioned earlier, I want the time columns to be in timestamp type and not string type. Therefore in the next 2 steps I tried to create a blank table and then insert the data from the table created from Step 1 but converting the string to timestamp this time.

  1. Create an external blank table called 'trips_not_raw' using the following Hive commands:

    create external table trips_not_raw
    (VendorID int,
    tpep_pickup_datetime timestamp,
    tpep_dropoff_datetime timestamp
    );
    
  2. Insert data from 'trips_raw' table (which was mentioned earlier in this question), using the Hive commands below:

    insert into table trips_not_raw select vendorid,
    from_unixtime(unix_timestamp(tpep_pickup_datetime, 'MM/dd/yyyy HH:mm:ss 
    aa')) as tpep_pickup_datetime,
    from_unixtime(unix_timestamp(tpep_dropoff_datetime, 'MM/dd/yyyy HH:mm:ss 
    aa')) as tpep_dropoff_datetime
    from trips_raw; 
    

Doing this inserts the rows into the blank table 'trips_not_raw', but the results from the 2 timestamp columns still showed as 'Null' as you can see from the screenshot below:

enter image description here

Is there a simple way to store the 2 time columns as 'timestamp' type and not 'string', but still be able to render them correctly in the output without seeing 'Null/None'?

Upvotes: 0

Views: 2116

Answers (1)

serge_k
serge_k

Reputation: 1772

I'm afraid you need to parse timestamp column and then cast string as timestamp. For example,

select cast(regexp_replace('1/1/2018 11:13:00 AM', '(\\d{1,2})/(\\d{1,2})/(\\d{4})\\s(\\d{2}:\\d{2}:\\d{2}) \\w{2}', '$3-$1-$2 $4') as timestamp)

You can create and use a macro function for convenience, e.g.,

create temporary macro parse_date (ts string)
  cast(regexp_replace(ts, '(\\d{1,2})/(\\d{1,2})/(\\d{4})\\s(\\d{2}:\\d{2}:\\d{2}) \\w{2}', '$3-$1-$2 $4') as timestamp);

then use it as follows

select parse_date('1/1/2018 11:13:00 AM');

Upvotes: 1

Related Questions