AlxRd
AlxRd

Reputation: 285

Hive SQL Extract string of varying length between two non-alphanumeric characters

I would like to extract strings of varying length located between two repeating underscores in Hive QL. Below I show a sampling of the pattern of the rows. Specifically, I would like to extract the string between the 3rd and 4th underscores. Thanks!

2016_sadfsa_IL_THIS_xsdaf_asd_eventbyevent_tsaC_NA_300x250 
2017_thisshopper_MA_THIS_NAT_Leb_ReasonsWhy_HDIMC_NA_300x600
2017_FordShopper_IL_THESE_NAT_sov_winterEvent_HDIMC_NA_300x600 

Just kept trying and I modified this from previous responses to non-Hive SQL. I am still interested in knowing better ways of doing this. Note that creative_str is the name of the column:

select creative_str, ltrim(rtrim(substring(regexp_replace(cast(creative_str as varchar(1000)), '_', repeat(cast(' ' as varchar(1000)),10000)), 30001, 10000))) 
from impression_cr

Upvotes: 2

Views: 1322

Answers (1)

nxl4
nxl4

Reputation: 734

You should be able to do this with Hive's SPLIT() function. If you're trying to grab the value between the third and fourth underscores, this will do it:

SELECT  SPLIT("2016_sadfsa_IL_THIS_xsdaf_asd_eventbyevent_tsaC_NA_300x250", "[_]")[3],
        SPLIT("2017_thisshopper_MA_THIS_NAT_Leb_ReasonsWhy_HDIMC_NA_300x600", "[_]")[3],
        SPLIT("2017_FordShopper_IL_THESE_NAT_sov_winterEvent_HDIMC_NA_300x600", "[_]")[3]

Upvotes: 2

Related Questions