How to use String split function on multi-delimiter strings in hive?

Question

My string data is like below.

data = 'ABCD/~DEFG/~HJKL/~MNOP'

I tried with below query

select split(data,'[/~]')[1] from test_table;

Expected output: ABCD Original output: [SPACES]

When i tried the same query with index [2] its working.

select split(data,'[/~]')[2] from test_table;

Expected output: DEFG Original output: DEFG

My observation: Its working fine for every EVEN index number like 0,2,4.. and it is populating spaces for ODD index numbers like 1,3,5..

Can someone help me in fixing this.

Ronak Patel · Accepted Answer

You need to use split as: split('\/~'):

hive> select split('word1/~word2/~word3','\/~')[0] as word1;
word1

Also, check my answer: load-data-into-hive-with-custom-delimiter using MultiDelimitSerDe
and other option using regexp_extract: hive-split-string-using-regex

Examples: there should be some easy regex to achieve this but I came up with this after following example from here:

hive> select regexp_extract('word1/~word2/~word3','^(\w.*)\/~(\w.*)$',2) as word3;
word3

hive> select regexp_extract('word1/~word2/~word3','^(?:([^/~]+)\/~?){1}',1) as word1;
word1

hive> select regexp_extract('word1/~word2/~word3','^(?:([^/~]+)\/~?){2}',1) as word2;
word2

How to use String split function on multi-delimiter strings in hive?

Answers (1)

Related Questions