QWERTZ
QWERTZ

Reputation: 11

MySQL REGEXP word boundary detection with german umlauts when using BINARY Operator

I made a strange discovery. If I execute the following SQL-Command:

SELECT 'Konzessionäre' REGEXP '[[:<:]]Konzession[[:>:]]'

it gives me the result - as expected - 0

But if I do the same together with the BINARY operator:

SELECT BINARY 'Konzessionäre' REGEXP '[[:<:]]Konzession[[:>:]]'

the result ist 1, so I think there is a MySQL problem with the regexp word boundary detection and german umlauts (like here the "ä") in conjunction with the BINARY Operator. As another example I can do this query:

SELECT BINARY 'Konzessionsäre' REGEXP '[[:<:]]Konzession[[:>:]]'

So here the result is 0 - as I would expect. So how can I solve this? Is this probably a bug in MySQL?

Thanks

Upvotes: 1

Views: 574

Answers (2)

Rick James
Rick James

Reputation: 142296

MySQL's REGEXP works with bytes, not characters. So, in CHARACTER SET utf8, ä is 2 bytes. It is unclear what the definition of "word boundary" in such a situation.

Recent versions of MariaDB have a better regexp engine.

Upvotes: 1

Bill Karwin
Bill Karwin

Reputation: 562378

By casting your string as BINARY you have stripped its associated character set property. So it's unclear how the word-boundary pattern should match. I'd guess it matches only ASCII values A-Z, a-z, 0-9, and also _.

When casting the string as BINARY, MySQL knows nothing about any other higher character values that also should be considered alphanumeric, because which characters should be alphanumeric depends on the character set.

I guess you are using BINARY to make this a case-sensitive regular expression search. Apparently, this has the unintended consequence of spoiling the word-boundary pattern-match.

You should not use BINARY in this comparison. You could do a secondary comparison to check for case-sensitive matching, but not with word boundaries.

SELECT (BINARY 'Konzessionäre' REGEXP 'Konzession') AND ('Konzessionäre' REGEXP '[[:<:]]Konzession[[:>:]]')

Upvotes: 1

Related Questions