Encoding columns in Hive

Question

I'm importing a table from mysql to hive using Sqoop. Some columns are latin1 encoded. Is there any way to do either:

Set the encoding for those columns as latin1 in Hive. OR
Convert the columns to utf-8 while importing with sqoop?

Vlad the Impala · Accepted Answer

Turned out the problem was unrelated. The column works fine regardless of encoding...but the table's schema had changed in mysql. I assumed that since I'm passing in the overwrite flag, sqoop would remake the table every time in Hive. Not so! The schema changes in mysql didn't get transferred to Hive, so the data in the md5 column was actually data from a different column.

The "fix" we settled on was, before every sqoop import check for schema changes, and if there was a change, drop the table and re-import. This forces a schema update in Hive.

Edit: my original sqoop command was something like:

sqoop import --connect jdbc:mysql://HOST:PORT/DB --username USERNAME --password PASSWORD --table uploads --hive-table uploads --hive-import --hive-overwrite --split-by id --num-mappers 8 --hive-drop-import-delims --null-string '\N' --null-non-string '\N'

But now I manually issue a drop table uploads to hive first if the schema changes.

Encoding columns in Hive

Answers (2)

Related Questions