GRK
GRK

Reputation: 101

Select data using utf-8 character encoding from hive

I am selecting data from my hive table/view but the character encoding is not picked up by the spark-shell or beeline, but if I am selecting the same data from Ambari(Directly throguh Hive), but from command line Hive has been disabled for security reasons. Please see the below data:

Ambari Data:

•Construction Maintenance 
• 524 N. Martin Luther King Jr.
‘SS-MN-BAE – Other’
¿NPM¿ GOVT/GS SCD US ARM
¿MCCRAY,LORENZO

beeline data:
?Construction Mai...
? 524 N. Martin L...
?SS-MN-BAE ? Other?
?NPM? GOVT/GS SCD...
?MCCRAY,LORENZO

Spark-shell Data:
?Construction Mai...
? 524 N. Martin L...
?SS-MN-BAE ? Other?
?NPM? GOVT/GS SCD...
?MCCRAY,LORENZO
using spark shell I did
 sql("select * from test.ACCOUNT order by customer_name desc").show()

Same select is issued in beeline and ambari.

if any one know what I am doing wrong or if I need to set any parameter to read the proper char set, please let me know I have tried java nio charset in spark shell but nothing worked out. Please guide me, pretty new to Hadoop. Is there a way I can pass the character set to beeline or spark-shell through command line before selecting the data?

Upvotes: 0

Views: 7237

Answers (2)

GRK
GRK

Reputation: 101

To read the data in the linux in the proper encoding, after logging into the linux, in my profile I have set the character type by using below variables:

export LANG="pt_PT.utf8"
export LC_ALL="pt_PT.utf8"

and reloaded the profile if it bash_profile then . .bash_profile if it is just profile then . .profile

Upvotes: 2

Ajay Kharade
Ajay Kharade

Reputation: 1525

This is not a Hive issue rather a file system or file encoding issue. SELECT * in Hive actually does nothing except read the file from file system. So if you run a hadoop fs cat on your underlying file, you should see the same behavior.

Upvotes: 1

Related Questions