Reputation: 1
I am trying to extract a substring from a text string in postgresql. The column name of the text string is short_description and I am using the REGEXP_SUBSTR function to define a regex that will return only the portion that I want:
SELECT short_description,
REGEXP_SUBSTR(short_description,'\\[[^=[]*') AS space
FROM my_table
This returns the following:
short_description space
----------------------------------------------------------------------------
[ABC12][1][ABCDEFG] ACB DEF [HIJ] | [ABC12]
What I would like to pull is the following:
short_description space
----------------------------------------------------------------------------
[ABC12][1][ABCDEFG] ACB DEF [HIJ] | [ABCDEFG]
Any ideas?
Upvotes: 0
Views: 7961
Reputation: 14035
You can use the Regex character classes to help with this kind of match. Here I'm looking for letters only, surrounded by brackets, and a following space. Note the use of double backslash \\
to escape the literal brackets and the double brackets [[:a:]]
for the character class
SELECT REGEXP_SUBSTR('[ABC12][1][ABCDEFG] ACB DEF [HIJ]','\\[[[:alpha:]]+\\] ');
regexp_substr
---------------
[ABCDEFG]
You could also use the SPLIT_PART function achieve something similar by splitting on a closing bracket ]
and choosing the 3rd value.
SELECT SPLIT_PART('[ABC12][1][ABCDEFG] ACB DEF [HIJ]',']',3);
split_part
------------
[ABCDEFG
I recommend using the built in functions rather than a UDF if at all possible. UDFs are fantastic when you need them but they do incur a performance penalty.
Upvotes: 1
Reputation: 4354
Here you go. I have found the right regexp expression using https://txt2re.com
Then, I have implemented it as a python redshift UDF
create or replace function f_regex (input_str varchar(max),regex_expression varchar(max))
returns VARCHAR(max)
stable
as $$
import re
rg = re.compile(regex_expression,re.IGNORECASE|re.DOTALL)
return rg.search(input_str).group(1)
$$ language plpythonu;
select f_regex('[ABC12][1][ABCDEFG] ACB DEF [HIJ] '::text,'.*?\\[.*?\\].*?\\[.*?\\](\\[.*?\\])'::text);
Once you have created the function, you can use it within any of your redshift selects.
So, in your case:
SELECT short_description,
f_regex(short_description::text,'.*?\\[.*?\\].*?\\[.*?\\](\\[.*?\\])'::text) AS space
FROM my_table
Upvotes: 0