user5102232
user5102232

Reputation:

converting date to timestamp in hive

I have a table in rdbms, where the date format is '1986-12-01'. I am using hive .08 not .12.

while I import the data into hive I get null for timestamp, is there any option to populate the data in the table directly from the file(the data is pretty big). or I have to use a a stage table with String and then use function to convert the data into timetamp, if so what it will be it like?

Thanks !

Upvotes: 1

Views: 9700

Answers (1)

Miguel
Miguel

Reputation: 1421

I will answer you based on MySQL, because I see a tag with that RDMS name in this post. So then, you have three options.

1. Filtering on Sqoop query side

I assume here that you import your data using Sqoop. This tool have an option that allows you to export the result of a SQL query. In that query you could use this MySQL method, UNIX_TIMESTAMP(date, format), that transform Date to Timestamp. The Sqoop instruction will be something like this:

sqoop import --connect jdbc:mysql://mysqlhost/mysqldb \
    --username user \
    --password passwd \
    --query "SELECT col_1, ..., UNIX_TIMESTAMP(STR_TO_DATE(date_col, '%Y-%m-%d')) FROM table1 WHERE \$CONDITIONS" \
    -m 1 \
    --target-dir hive_table1_data`

Notice that WHERE \$CONDITIONS is mandatory. Furthermore I've assumes here your date column is a string. But if it is a date type, then the method str_to_date is not needed.

2. On the RDMS side

In this point I will assume that you have permissions to CREATE VIEWS in that database.

Other solution is create a view that contains the field date transformed to timestamp. As we saw below the function is UNIX_TIMESTAMP(date, format). Here you have more details. In your case you should write something like this:

CREATE VIEW view_to_export AS 
  SELECT 
      fields1,
      ...,
      UNIX_TIMESTAMP(STR_TO_DATE(field_date, '%Y-%m-%d'));

Assuming your date field is a string data type, if it is a date data type, the method str_to_date is not needed. The sqoop instruction will be something like this:

sqoop import --connect jdbc:mysql://mysqlhost/mysqldb \
    --username user \
    --password passwd \
    --table mysql_table_name \
    -m 1 \
    --target-dir hive_table1_data

So then, with that view, you could use it with sqoop without problems.

3. After import the data

In case you use other kind of import system, but you have the data stored into your HDFS. You could create another new table with that field transformed. To do that you could use something like this:

INSERT INTO TABLE my_table
SELECT
   col_1,
   ...,
   unix_timestamp(date_col, 'yyyy-MM-dd')
)
FROM exported_table;

In this case you need more time to process. But it is independent on the way you import your data, and use less resources from MySQL server, because you don't have to calculate the transformation from date to timestamp for each row. You delegate that to Hadoop which mission is process huge amount of data.

Upvotes: 3

Related Questions