Select data using utf-8 character encoding from hive

Question

I am selecting data from my hive table/view but the character encoding is not picked up by the spark-shell or beeline, but if I am selecting the same data from Ambari(Directly throguh Hive), but from command line Hive has been disabled for security reasons. Please see the below data:

Ambari Data:

•Construction Maintenance 
• 524 N. Martin Luther King Jr.
‘SS-MN-BAE – Other’
¿NPM¿ GOVT/GS SCD US ARM
¿MCCRAY,LORENZO

beeline data:
?Construction Mai...
? 524 N. Martin L...
?SS-MN-BAE ? Other?
?NPM? GOVT/GS SCD...
?MCCRAY,LORENZO

Spark-shell Data:
?Construction Mai...
? 524 N. Martin L...
?SS-MN-BAE ? Other?
?NPM? GOVT/GS SCD...
?MCCRAY,LORENZO

using spark shell I did
 sql("select * from test.ACCOUNT order by customer_name desc").show()

Same select is issued in beeline and ambari.

if any one know what I am doing wrong or if I need to set any parameter to read the proper char set, please let me know I have tried java nio charset in spark shell but nothing worked out. Please guide me, pretty new to Hadoop. Is there a way I can pass the character set to beeline or spark-shell through command line before selecting the data?

GRK · Accepted Answer

To read the data in the linux in the proper encoding, after logging into the linux, in my profile I have set the character type by using below variables:

export LANG="pt_PT.utf8"

export LC_ALL="pt_PT.utf8"

and reloaded the profile if it bash_profile then . .bash_profile if it is just profile then . .profile

Select data using utf-8 character encoding from hive

Answers (2)

Related Questions