Chetan Pulate
Chetan Pulate

Reputation: 503

Hive Utf-8 Encoding number of characters supported?

Hi actually the problem is as follows the data i want to insert in hive table has latin words and its in utf-8 encoded format. But still hive does not display it properly.

Actual Data:- Actual Data

Data Inserted in hive

Hive Data

I changed the encoding of the table to utf-8 as well still same issue below are the hive DDL and commands

CREATE TABLE IF NOT EXISTS test6
(
CONTACT_RECORD_ID    string,
ACCOUNT    string,
CUST    string,
NUMBER    string,
NUMBER1    string,
NUMBER2    string,
NUMBER3    string,
NUMBER4    string,
NUMBER5    string,
NUMBER6    string,
NUMBER7    string,
LIST    string
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '|';
ALTER TABLE test6 SET serdeproperties ('serialization.encoding'='UTF-8');

Does hive support only the first 128 characters of UTF-8? Please do suggest.

Upvotes: 8

Views: 26728

Answers (2)

Tokci
Tokci

Reputation: 1280

For me adding following line worked.

TBLPROPERTIES('serialization.encoding'='windows-1252')

Example code:

CREATE EXTERNAL TABLE IF NOT EXISTS test.tbl
(
    name string,
    gender string,
    age string,
    address string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n' STORED AS TEXTFILE
LOCATION 'adl://<Data-Lake-Store>.azuredatalakestore.net/<Folder-Name>/'
TBLPROPERTIES('serialization.encoding'='windows-1252');

Upvotes: 2

BalaramRaju
BalaramRaju

Reputation: 439

this may not be ideal solution , but this works. Hive somehow doesn't seem to treat them as UTF8. Please try to create the table with following parameters:

CREATE TABLE testjoins.yt_sample_mapping_1(
   `col1` string,
   `col2` string,
   `col3` string)
   ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
   WITH SERDEPROPERTIES ( "separatorChar" = ",", 
    "quoteChar" = "\"", 
    "escapeChar" = "\\", 
    "serialization.encoding"='ISO-8859-1') 
    TBLPROPERTIES ( 'store.charset'='ISO-8859-1', 
    'retrieve.charset'='ISO-8859-1');

Upvotes: 4

Related Questions