goks
goks

Reputation: 1206

How to use String split function on multi-delimiter strings in hive?

My string data is like below.

data = 'ABCD/~DEFG/~HJKL/~MNOP'

I tried with below query

select split(data,'[/~]')[1] from test_table;

Expected output: ABCD Original output: [SPACES]

When i tried the same query with index [2] its working.

select split(data,'[/~]')[2] from test_table;

Expected output: DEFG Original output: DEFG

My observation: Its working fine for every EVEN index number like 0,2,4.. and it is populating spaces for ODD index numbers like 1,3,5..

Can someone help me in fixing this.

Upvotes: 2

Views: 14823

Answers (1)

Ronak Patel
Ronak Patel

Reputation: 3849

You need to use split as: split('\\/~'):

hive> select split('word1/~word2/~word3','\\/~')[0] as word1;
word1

Also, check my answer: load-data-into-hive-with-custom-delimiter using MultiDelimitSerDe
and other option using regexp_extract: hive-split-string-using-regex


Examples: there should be some easy regex to achieve this but I came up with this after following example from here:

hive> select regexp_extract('word1/~word2/~word3','^(\\w.*)\\/~(\\w.*)$',2) as word3;
word3

hive> select regexp_extract('word1/~word2/~word3','^(?:([^/~]+)\\/~?){1}',1) as word1;
word1

hive> select regexp_extract('word1/~word2/~word3','^(?:([^/~]+)\\/~?){2}',1) as word2;
word2

Upvotes: 6

Related Questions