Martin
Martin

Reputation: 22760

MariaDB regex with Concat only catches some results

I have a table of VARCHAR column and I need to check it for a value:

Typical data values are:

Brize Norton (501, 622, 2624, 4624, 4626)
Wyton (7006, 7010, 7630)
Waddington (2503, 7006)
Honington (2623)
Marham (2620, 7010)
Leeming (607 & 609)

The only part I need to check is that it contains the full number only. I can not check just the number because LIKE '%607%' will also incorrectly match 6070 or 2607, so I check the number and a variation of wrappers as so:

I have this query:

SELECT id, name FROM aux WHERE aux.name REGEXP CONCAT('[(,\h]',:num, '[),\h]') 

this is intended to catch any (( or, or <whitespace>, a variable number value , ) or , or <whitespace> ) in a VARCHAR column.

This works on some numbers but not on others;

An example :

:num = 2620
SELECT id, name FROM aux WHERE aux.name REGEXP CONCAT('[(,\h]',:num, '[),\h]')

Result:

"Marham (2620, 7010)"

but fails on other numbers:

 :num = 7010
SELECT id, name FROM aux WHERE aux.name REGEXP CONCAT('[(,\h]',:num, '[),\h]')

Result:

(Nothing)

How can I tell the REGEXP to catch the data shaped as above ( or, or <whitespace>, a variable number value , ) or , or <whitespace>.

I have tried EXPLAIN on my query but that doesn't help me at see the REGEXP mechanism.

I have replaced \h with \s but this doesn't make a difference.

Upvotes: 0

Views: 189

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627022

Your MariabDB does not support PCRE regex syntax, so only POSIX-compliant regex can be used. Neither \h nor \s are POSIX compliant, in POSIX world, \h is "equivalent" to [:blank:] and \s to [:space:].

More POSIX/PCRE character class equivalent patterns:

POSIX Character Class PCRE Description
[:alnum:] [:alnum:] / [\p{L}\p{N}] Alphanumeric
[:alpha:] \p{L} Alphabetic
[:blank:] \h Whitespace
[:cntrl:] [:cntrl:] / \p{Cc} / \p{C} Control characters
[:digit:] \d Digits
[:graph:] [:graph:] Graphic characters
[:lower:] \p{Ll} Lowercase alphabetic
[:print:] [:print:] Graphic or space characters
[:punct:] [\p{P}\p{S}] Punctuation
[:space:] \s Space, tab, newline, and carriage return
[:upper:] \p{Lu} Uppercase alphabetic
[:xdigit:] [:xdigit:] / [A-Fa-f0-9] Hexadecimal digit

You can use

REGEXP CONCAT('[(,[:blank:]]', :num, '[),[:blank:]]')
REGEXP CONCAT('[(,[:space:]]', :num, '[),[:space:]]')

If you simply want to enforce numeric boundaries use

CONCAT('([^0-9]|^)',:num, '([^0-9]|$)')
CONCAT('([^[:digit:]]|^)',:num, '([^[:digit:]]|$)')

The regex details:

  • [(,[:space:]] - a (, , or any whitespace char
  • [),[:space:]] - a ), , or any whitespace char
  • [(,[:blank:]] - a (, , or a horizontal whitespace char
  • [),[:blank:]] - a ), , or a horizontal whitespace char
  • ([^0-9]|^) / ([^[:digit:]]|^) - any non-digit char or start of string
  • ([^0-9]|$) / ([^[:digit:]]|$) - any non-digit char or end of string.

Upvotes: 1

raina77ow
raina77ow

Reputation: 106413

Essentially, it's a matter of versions, as until version 10.0.5 MariaDB used the POSIX 1003.2 compliant regular expression library. This library didn't support \h, \d etc. character classes, using their POSIX variants - [:alpha:], [:digit:] and so on.

In your case, however, it seems you might just replace \s or \h with a single whitespace in that character class:

REGEXP CONCAT('[(, ]', :num, '[), ]')

Upvotes: 3

Related Questions