Mike Wise
Mike Wise

Reputation: 22827

Assessing from the end of a split array in Hive

I need to split a tag that looks something like "B1/AHU/_1/RoomTemp", "B1/AHU/_1/109/Temp", so with a variable with a variable number of fields. I am interested in getting the last field, or sometimes the last but one. I was disappointed to find that negative indexes do not count from the right and allow me to select the last element of an array in Hive as they do in Python.

select tag,split(tag,'[/]')[ -1] from sensor

I was more surprised when this did not work either:

select tag,split(tag,'[/]')[ size(split(tag,'[\]'))-1 ] from sensor

Both times giving me an error along the lines of this:

FAILED: SemanticException 1:27 Non-constant expressions for array indexes not supported. 
Error encountered near token '1'

So any ideas? I am kind of new to Hive. Regex's maybe? Or is there some syntactic sugar I am not aware of?

Upvotes: 6

Views: 10319

Answers (4)

henry zhu
henry zhu

Reputation: 651

After reading the LanguageManual UDF a while, I luckily found the function substring_index exactly meets your requirement, dosen't need any additional calculations at all.

The manual says:

substring_index(string A, string delim, int count) returns the substring from string A before count occurrences of the delimiter delim (as of Hive 1.3.0). If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: substring_index('www.apache.org', '.', 2) = 'www.apache'.

Use cases:

SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
--www.mysql

SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
--com

See here for more information.

Upvotes: 1

Mike Wise
Mike Wise

Reputation: 22827

This question is getting a lot of views (over a thousand now), so I think it needs a proper answer. In the event I solved it with this:

select tag,reverse(split(reverse(tag),'[/]')[0]) from sensor

which is not actually stated in the other suggested answers - I got the idea from a suggestion in the comments.

This:

  • reverses the string (so "abcd/efgh" is now "hgfe/dcba")
  • splits it on "/" into an array (so we have "hgfe" and "dcba")
  • extracts the first element (which is "hgfe")
  • then finally re-reverses (giving us the desired "efgh")

Also note that the second-to-last element can be retrieved by substituting 1 for the 0, and so on for the others.

Upvotes: 12

dedricF
dedricF

Reputation: 161

This seem to work for me, this returns the last element from the SPLIT array

SELECT SPLIT(INPUT__FILE__NAME,'/')[SIZE(SPLIT(INPUT__FILE__NAME,'/')) -1 ] from test_table limit 10;

Upvotes: 3

o-90
o-90

Reputation: 17593

There is a great library of Hive UDFs here. One of them is LastIndexUDF(). It's pretty self-explainatory, it retrieves the last element of an array. There are instructions to build and use the jar on the main page. Hope this helps.

Upvotes: 3

Related Questions