Reputation: 503
Hi actually the problem is as follows the data i want to insert in hive table has latin words and its in utf-8 encoded format. But still hive does not display it properly.
Data Inserted in hive
I changed the encoding of the table to utf-8 as well still same issue below are the hive DDL and commands
CREATE TABLE IF NOT EXISTS test6
(
CONTACT_RECORD_ID string,
ACCOUNT string,
CUST string,
NUMBER string,
NUMBER1 string,
NUMBER2 string,
NUMBER3 string,
NUMBER4 string,
NUMBER5 string,
NUMBER6 string,
NUMBER7 string,
LIST string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';
ALTER TABLE test6 SET serdeproperties ('serialization.encoding'='UTF-8');
Does hive support only the first 128 characters of UTF-8? Please do suggest.
Upvotes: 8
Views: 26728
Reputation: 1280
For me adding following line worked.
TBLPROPERTIES('serialization.encoding'='windows-1252')
Example code:
CREATE EXTERNAL TABLE IF NOT EXISTS test.tbl
(
name string,
gender string,
age string,
address string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n' STORED AS TEXTFILE
LOCATION 'adl://<Data-Lake-Store>.azuredatalakestore.net/<Folder-Name>/'
TBLPROPERTIES('serialization.encoding'='windows-1252');
Upvotes: 2
Reputation: 439
this may not be ideal solution , but this works. Hive somehow doesn't seem to treat them as UTF8. Please try to create the table with following parameters:
CREATE TABLE testjoins.yt_sample_mapping_1(
`col1` string,
`col2` string,
`col3` string)
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
WITH SERDEPROPERTIES ( "separatorChar" = ",",
"quoteChar" = "\"",
"escapeChar" = "\\",
"serialization.encoding"='ISO-8859-1')
TBLPROPERTIES ( 'store.charset'='ISO-8859-1',
'retrieve.charset'='ISO-8859-1');
Upvotes: 4