greg
greg

Reputation: 3495

Postgresql's substring expression

I have a field message with strings like <pika> [SOME_TEXT_WITH|ACTION] And other stuff....

I wish to capture what's inside the brackets. I use the following form:

SELECT 
  substring(message FROM '%> \[#"[A-Z_\|]+#"\] %' FOR '#') AS my_info 
FROM my_table;

But it always fail with the same ennoying error message: «Invalid regular expression: parentheses () not balanced». What do I do wrong ?

Upvotes: 1

Views: 4636

Answers (2)

Craig Ringer
Craig Ringer

Reputation: 324455

Personally, I'd use a perl-compatible modern regexp instead of the horrid POSIX-esque regexps:

regress=> SELECT (regexp_matches('<pika> [SOME_TEXT_WITH|ACTION] And other stuff...', '\[(.*?)\]'))[1];
    regexp_matches     
-----------------------
 SOME_TEXT_WITH|ACTION
(1 row)

If you want to use the POSIX syntax you have to use the same escape consistently, not \ in some places and # in others. eg:

regress=> SELECT substring(
            '<pika> [SOME_TEXT_WITH|ACTION] And other stuff...' 
            FROM '%#"#[%#]#"%' FOR '#'
          );
        substring        
-------------------------
 [SOME_TEXT_WITH|ACTION]
(1 row)

The docs don't make it very clear that the capture operator is actually <ESCAPECHAR>", not #" specifically. This is equally valid, using a regular backslash escape:

regress=> SELECT substring(
              '<pika> [SOME_TEXT_WITH|ACTION] And other stuff...' 
              FROM '%\"\[%\]\"%' FOR '\'
          );
        substring        
-------------------------
 [SOME_TEXT_WITH|ACTION]
(1 row)

The cause of the odd error is that PostgreSQL translates the POSIX SIMILAR TO style expression into a real regular expression under the covers. Your mixed-escapes regex:

'%> \[#"[A-Z_\|]+#"\] %' FOR '#'

is being turned into something like:

'.*> \\[([A-Z_\\|]+)\\] .*'

resulting in:

regress=> SELECT (regexp_matches('<pika> [SOME_TEXT_WITH|ACTION] And other stuff...', '.*> \\[([A-Z_\\|]+)\\] .*'))[1];
ERROR:  invalid regular expression: parentheses () not balanced

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269773

I think the following does what you want:

SELECT substring(cast(message as varchar(1000)) FROM '.*\[([A-Z_\|]*)\].*'
                )
FROM my_table;

Upvotes: 1

Related Questions