Separating fields out of a string in Hive

Question

I have the following problem... I work with Hive and want to add a file with several (different) rows of Strings. Those contain fields with a fixed size, like this:

A20130420bcd   34  fgh

where the fields have the length 1,8,6,4,3. Separated it would look like this:

"A,20130420,bcd,fgh"

Is there any possibility to read the String and sort it into a field besides getting it as a substring for every field like

substring(col_value,1,1) Field1

etc? I would imagine that cutting the already read part of the string would increase the performance, but i could think of any way to do this with the given functions here.

Secondly, as stated before, there are different types of strings, ordered and identified by the first character.right now just check those with the WHERE-Statement, but it's horrible, as it runs through the whole file just to find only the first String. Is there any way to read specific lines by their number? If i know, that the first string will be of a certain kind, can read it directly?

right it looks like this:

insert overwrite table TEST 
SELECT 
substring(col_value,1,1) field1, 
... 
substring(col_value,10,3) field 5 
from temp_data WHERE substring(col_value,1,1) = 'A';

any ideas on this?

I would love to hear some ideas =)

WeiChing 林煒清 · Accepted Answer

You need to write yours generic-UDF parser that output the struct or map or whatever appropriate. you can refer to UDF that output multi-values.

then you can write

insert overwrite table output
select parsed.first, parsed.second
from (
  select parse(taget)
  from input
) parsed
where first='X';

About second question,you may need to check "explain" command of hive to see if hive do filter push-down for you.(just see how many map reduce it takes, theoretically it should be one map, depending on 1.hive version, 2.output table format .)

In general sense, this is why database is popular -- take optimization into consideration for you .

Separating fields out of a string in Hive

Answers (1)

Related Questions