Reputation: 3495
I have a field message
with strings like <pika> [SOME_TEXT_WITH|ACTION] And other stuff...
.
I wish to capture what's inside the brackets. I use the following form:
SELECT
substring(message FROM '%> \[#"[A-Z_\|]+#"\] %' FOR '#') AS my_info
FROM my_table;
But it always fail with the same ennoying error message: «Invalid regular expression: parentheses () not balanced». What do I do wrong ?
Upvotes: 1
Views: 4636
Reputation: 324455
Personally, I'd use a perl-compatible modern regexp instead of the horrid POSIX-esque regexps:
regress=> SELECT (regexp_matches('<pika> [SOME_TEXT_WITH|ACTION] And other stuff...', '\[(.*?)\]'))[1];
regexp_matches
-----------------------
SOME_TEXT_WITH|ACTION
(1 row)
If you want to use the POSIX syntax you have to use the same escape consistently, not \
in some places and #
in others. eg:
regress=> SELECT substring(
'<pika> [SOME_TEXT_WITH|ACTION] And other stuff...'
FROM '%#"#[%#]#"%' FOR '#'
);
substring
-------------------------
[SOME_TEXT_WITH|ACTION]
(1 row)
The docs don't make it very clear that the capture operator is actually <ESCAPECHAR>"
, not #"
specifically. This is equally valid, using a regular backslash escape:
regress=> SELECT substring(
'<pika> [SOME_TEXT_WITH|ACTION] And other stuff...'
FROM '%\"\[%\]\"%' FOR '\'
);
substring
-------------------------
[SOME_TEXT_WITH|ACTION]
(1 row)
The cause of the odd error is that PostgreSQL translates the POSIX SIMILAR TO
style expression into a real regular expression under the covers. Your mixed-escapes regex:
'%> \[#"[A-Z_\|]+#"\] %' FOR '#'
is being turned into something like:
'.*> \\[([A-Z_\\|]+)\\] .*'
resulting in:
regress=> SELECT (regexp_matches('<pika> [SOME_TEXT_WITH|ACTION] And other stuff...', '.*> \\[([A-Z_\\|]+)\\] .*'))[1];
ERROR: invalid regular expression: parentheses () not balanced
Upvotes: 2
Reputation: 1269773
I think the following does what you want:
SELECT substring(cast(message as varchar(1000)) FROM '.*\[([A-Z_\|]*)\].*'
)
FROM my_table;
Upvotes: 1