Alexaner Tipugin
Alexaner Tipugin

Reputation: 507

Postgres regex issue

I need to find all records stored in postgres, which matching following regexp:

^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$

Something like this:

SELECT * FROM users WHERE users.phone ~ '^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$'

But this one falls with error:

invalid regular expression: quantifier operand invalid

Why won't Postgres work with this regex?

Using the same one in plain Ruby works just fine.

UPDATE

Problem is only with WHERE. When i try to:

SELECT '+79637434199' ~ '^((8|\+7)[\- ]?)(\(?\d{3}\)?[\- ]?)[\d\- ]{7,10}'

Postgres returns true. But when i try:

SELECT * FROM users WHERE users.phone ~ '^((8|\+7)[\- ]?)(\(?\d{3}\)?[\- ]?)[\d\- ]{7,10}'

Result: "invalid regular expression: quantifier operand invalid".

Upvotes: 5

Views: 8883

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 659197

You don't need to escape - inside a character class when you put it at the first or last position, because it cannot be misread as range that way:

[\- ][- ]
[\d\- ][\d -]

The way you have it the upper bound 10 at the end is futile.
Add $ at the end to disallow trailing characters.
Or \D to disallow trailing digits (but require a non-digit).
Or ($|\D) to either end the string there or have a non-digit follow.

Put together:

SELECT '+79637434199' ~ '^(8|\+7)[ -]?(\(?\d{3}\)?[ -]?)[\d -]{7,10}($|\D)'

Otherwise your expression is just fine and it works for me on PostgreSQL 9.1.4. It should not make any difference whatsoever whether you use it in a WHERE clause or in a SELECT list - unless you are running into a bug with some old version (like @kgrittn commented).


If I prepend the string literal with E, I can provoke the error message that you get. This cannot explain your problem, because you stated that the expression works fine as SELECT item.

But, as Sherlock Holmes is quoted, "when you have excluded the impossible, whatever remains, however improbable, must be the truth."

Maybe you ran one test with standard_conforming_strings = on and the other one with standard_conforming_strings = off - this was the default interpretation of string literals in older versions before 9.1. Maybe with two different clients (that have a different setting as to that).

Read more in the chapter String Constants with C-style Escapes in the manual.

Upvotes: 6

Related Questions