Reputation:
I have a table in rdbms, where the date format is '1986-12-01'. I am using hive .08 not .12.
while I import the data into hive I get null for timestamp, is there any option to populate the data in the table directly from the file(the data is pretty big). or I have to use a a stage table with String and then use function to convert the data into timetamp, if so what it will be it like?
Thanks !
Upvotes: 1
Views: 9700
Reputation: 1421
I will answer you based on MySQL, because I see a tag with that RDMS name in this post. So then, you have three options.
I assume here that you import your data using Sqoop. This tool have an option that allows you to export the result of a SQL query. In that query you could use this MySQL method, UNIX_TIMESTAMP(date, format), that transform Date to Timestamp. The Sqoop instruction will be something like this:
sqoop import --connect jdbc:mysql://mysqlhost/mysqldb \
--username user \
--password passwd \
--query "SELECT col_1, ..., UNIX_TIMESTAMP(STR_TO_DATE(date_col, '%Y-%m-%d')) FROM table1 WHERE \$CONDITIONS" \
-m 1 \
--target-dir hive_table1_data`
Notice that WHERE \$CONDITIONS is mandatory. Furthermore I've assumes here your date column is a string. But if it is a date type, then the method str_to_date is not needed.
In this point I will assume that you have permissions to CREATE VIEWS in that database.
Other solution is create a view that contains the field date transformed to timestamp. As we saw below the function is UNIX_TIMESTAMP(date, format). Here you have more details. In your case you should write something like this:
CREATE VIEW view_to_export AS
SELECT
fields1,
...,
UNIX_TIMESTAMP(STR_TO_DATE(field_date, '%Y-%m-%d'));
Assuming your date field is a string data type, if it is a date data type, the method str_to_date is not needed. The sqoop instruction will be something like this:
sqoop import --connect jdbc:mysql://mysqlhost/mysqldb \
--username user \
--password passwd \
--table mysql_table_name \
-m 1 \
--target-dir hive_table1_data
So then, with that view, you could use it with sqoop without problems.
In case you use other kind of import system, but you have the data stored into your HDFS. You could create another new table with that field transformed. To do that you could use something like this:
INSERT INTO TABLE my_table
SELECT
col_1,
...,
unix_timestamp(date_col, 'yyyy-MM-dd')
)
FROM exported_table;
In this case you need more time to process. But it is independent on the way you import your data, and use less resources from MySQL server, because you don't have to calculate the transformation from date to timestamp for each row. You delegate that to Hadoop which mission is process huge amount of data.
Upvotes: 3