Reputation: 8680
I have some data in Hive 1.2.1 table. I have to get raw bytes of a specific column. The column data is html raw in multiple languages. In order to get length of characters, I can use simple query like below
select baseurl, LENGTH(content) from clss limit 30;
Above query is ok for characters length the problem is for text other is English, their value is incorrect. For a Character in Arabic, it is saved as unicoded that's why character length is changed. Some characters are of two bytes and some are single byte.
Is there any builtin function to know bytes of text instead of characters ?
Upvotes: 1
Views: 2758
Reputation: 38335
Function character_length(string str)
was added in Jira HIVE-15979 And it says Fix versions 2.3.0. If you cannot upgrade your Hive (and this is quite risky), then try to download UDF source codes and build it, then add jar and create temporary function.
Download code: GenericUDFCharacterLength.java
Upvotes: 1