Reputation: 101
I am selecting data from my hive table/view but the character encoding is not picked up by the spark-shell or beeline, but if I am selecting the same data from Ambari(Directly throguh Hive), but from command line Hive has been disabled for security reasons. Please see the below data:
Ambari Data:
•Construction Maintenance
• 524 N. Martin Luther King Jr.
‘SS-MN-BAE – Other’
¿NPM¿ GOVT/GS SCD US ARM
¿MCCRAY,LORENZO
beeline data:
?Construction Mai...
? 524 N. Martin L...
?SS-MN-BAE ? Other?
?NPM? GOVT/GS SCD...
?MCCRAY,LORENZO
Spark-shell Data:
?Construction Mai...
? 524 N. Martin L...
?SS-MN-BAE ? Other?
?NPM? GOVT/GS SCD...
?MCCRAY,LORENZO
using spark shell I did
sql("select * from test.ACCOUNT order by customer_name desc").show()
Same select is issued in beeline and ambari.
if any one know what I am doing wrong or if I need to set any parameter to read the proper char set, please let me know I have tried java nio charset in spark shell but nothing worked out. Please guide me, pretty new to Hadoop. Is there a way I can pass the character set to beeline or spark-shell through command line before selecting the data?
Upvotes: 0
Views: 7237
Reputation: 101
To read the data in the linux in the proper encoding, after logging into the linux, in my profile I have set the character type by using below variables:
export LANG="pt_PT.utf8"
export LC_ALL="pt_PT.utf8"
and reloaded the profile if it bash_profile then . .bash_profile
if it is just profile then . .profile
Upvotes: 2
Reputation: 1525
This is not a Hive issue rather a file system or file encoding issue. SELECT * in Hive actually does nothing except read the file from file system. So if you run a hadoop fs cat on your underlying file, you should see the same behavior.
Upvotes: 1