bernie2436
bernie2436

Reputation: 23901

What is the meaning of the \\+ in this regex?

I am trying to parse this example regex.

I know that slashes can be used as escape characters. So if you wanted to search for ) without implying a grouping you would do \ and then ) spelling this out to avoid stack overflow regex...

I also know that a plus sign can indicate one or more of the preceding item.

But in the example below, is the plus sign or the slash getting escaped? It seems like the first slash allows you to "escape" the second slash and then the plus sign indicates that there is at least one prior slashes --- but the example says there are at least two + in the string...

What does this regex mean? There are too many new things going on for me to parse it.

enter image description here

Upvotes: 5

Views: 12789

Answers (2)

Craig Ringer
Craig Ringer

Reputation: 324375

That regexp can actually mean two different things, depending on the PostgreSQL version and the value of standard_conforming_strings.

Old versions (before standard_conforming_strings or those that defaulted to off) would interpret the string as a backslash-escaped string. So PostgreSQL would turn \\+\\+* into \+\+*, i.e. it'd consume a level of escaping. Then the regular expression would consume the remaining level to escape the pluses, so they're interpreted as literal +s not qualifiers. That regexp says ++ followed by zero or more other characters.

Newer versions with standard_conforming_strings defaulting to on will, per the SQL standard, not decode the backslashes as escapes. So you'll run the regexp \\+\\+*, which is one or more backslashes, followed by one or more backslashes, followed by ... oops, the asterisk without a preceding character is an error.

So we know you must have standard_conforming_strings off, 'cos the query would fail to compile the regexp on a new one.

regress=> SELECT 'blah' ~ '\\+\\+*';
ERROR:  invalid regular expression: quantifier operand invalid

postgres=> SHOW standard_conforming_strings;
 standard_conforming_strings 
-----------------------------
 on
(1 row)

You'll have this problem later on, so I suggest dealing with it before you upgrade.

Assuming that the x_spam_level field always starts with the pluses, which the regexp doesn't check, that code might be better written as:

x_spam_level LIKE '++%'

If it doesn't start with the pluses use:

x_spam_level LIKE '%++%'

which is what the current regexp is doing. PostgreSQL will turn that into a regular expression internally, but you don't have to worry about the escaping.

If you want to use a regular expression and have it behave consisently across all versions, use:

x_spam_level ~ E'\\+\\+*'

The E'' syntax tells PostgreSQL to decode backslash escapes, irrespective of the standard_conforming_strings setting.

Upvotes: 2

dee-see
dee-see

Reputation: 24078

But in the example below, is the plus sign or the slash getting escaped?

Both!

The \ is escaped because the query language you are using probably uses it as an escape character itself (i.e. to escape quotes). So \\ is understood as a single \ in the regex, which is then used to escape the +. The regex means a single + followed by zero or many +.

It could probably be rewritten as \\++ where the second + is actually the regex quantifier.

Upvotes: 10

Related Questions