Reputation: 19896
I am doing this in Impala or Hive. Basically let say I have a string like this
f-150:aa|f-150:cc|g-210:dd
Each element is separated by the pipe |
. Each has prefix f-150
or whatever. I want to be able to remove the prefix and keep only element that matches specific prefix. For example, if the prefix is f-150
, I want the final string after regex_replace is
aa|cc
dd
is removed because g-210
is different prefix and not match, therefore the whole element is removed.
Any idea how to do this using string expression in one SQL?
Thanks
UPDATE 1
I tried this in Impala
:
select regexp_extract('f-150:aa|f-150:cc|g-210:dd','(?:(?:|(\\|))f-150|keep|those):|(?:^|\\|)\\w-\\d{3}:\\w{2}',0);
But got this output:
f-150:aa
In Hive
, I got NULL
.
Upvotes: 1
Views: 899
Reputation: 163632
You could match the values that you want to remove and then replace with an empty string:
f-150:|\|[^:]+:[^|]+$|[^|]+:[^|]+\|
f-150:|\\|[^:]+:[^|]+$|[^|]+:[^|]+\\|
Explanation
f-150:
Match literally|
Or\|[^:]+:[^|]+$
Match a pipe, not a colon one or more times followed by not a pipe one or more times and assert the end of the line|
Or[^|]+:[^|]+\|
Match not a pipe one or more times, a colon followed by matching not a pipe one or more times and then match a pipeTest with multiple lines and combinations
Upvotes: 1
Reputation: 2460
You may have to loop through the string until the end to get the all the matching sub string. Look ahead syntax is not supported in most sql so above regexp might not be suitable for SQL syntax. For you purpose you can do something like creating a table to loop through just to mimic Oracle's level syntax and join with your table containing the string.
With loop_tab as (
Select 1 loop union all
Select 2 union all
select 3 union all
select 4 union all
select 5),
string_tab as(Select 'f-150:aa|ade|f-150:ce|akg|f-150:bb|'::varchar(40) as str)
Select regexp_substr(str,'(f\\-150\\:\\w+\\|)',1,loop)
from string_tab
join loop_tab on 1=1
Output:
regexp_substr
f-150:aa|
f-150:ce|
f-150:bb|
Upvotes: 0
Reputation: 19000
The regexyou in question could look like this:
(?:(?:|(\\|))f-150|keep|those):|(?:^|\\|)\\w-\\d{3}:\\w{2}
I have added some pseudo keywords to retain, but I am sure you get the idea:
|
at the beginning of an element in group 1 and put it back in the replacement with $1
.According to the documentation, your query should be written like a Java regex; likewise, this should perform like this code sample in Java.
Upvotes: 1