Khom Nazid
Khom Nazid

Reputation: 630

REGEX to find words containing permutations of letters, with one letter mandatory, and NOT any other

We have a mysql table with a column that has alphanumeric text. We have a unique use case where we need to find every row in the table where this column has text that contains:

As such, these would be OK:

SCAN
SCATTCC
NCAS 
NTTAAS

These would NOT be OK:

CATI     (does not contain S, and contains I which is outside of S, T, C, A, N) 
SCANNF   (contains F which is outside of S, T, C, A, N) 
NCASO    (contains O which is outside of S, T, C, A, N) 
..

We tried something like this but it doesn't work:

[tacn]*[s]+(?![^stacn])+

Upvotes: 0

Views: 304

Answers (3)

Ashutosh Parida
Ashutosh Parida

Reputation: 104

Use this regex for pattern matching :

pattern = "^(S+[TACN]+S*[TACN]*)+$|^(S*[TACN]+S+[TACN]*)+$"

Upvotes: 0

Nick
Nick

Reputation: 147216

MySQL 8.0.4+

From MySQL 8.0.4 onwards, MySQL regexp support changed to the ICU library, and lookaheads are supported. For those versions, this regex will meet your requirements:

'^(?=.*s)(?=.*[acnt])(?!.*[^acnst])'

It uses 3 lookaheads:

  1. (?=.*s) asserts there is an S in the string;
  2. (?=.*[acnt]) asserts there is at least one of [ACNT] in the string;
  3. (?!.*[^acnst]) asserts that there are no characters other than [ACNST] in the string.

Demo on dbfiddle

MySQL prior to 8.0.4

This regex will give you the results you want:

^[tacn]*(s[cant]|[tacn]s)[cant]*$

It looks for either

  • an S preceded by zero or more of [TACN] and followed by one or more of [TACN]; or
  • an S preceded by one or more of [TACN] and followed by zero or more of [TACN]

Query:

SELECT str, 
       str REGEXP '^[tacn]*(s[cant]|[tacn]s)[cant]*$' AS `match`
FROM test

Output:

str     match
SCAN    1
SCATTCC 1
NCAS    1
NTTAAS  1
CATI    0
SCANNF  0
NCASO   0
CANT    0
S       0
SS      0
TS      1
SC      1
STS     0

Demo on dbfiddle

If you want to allow S to occur more than once, just change the regex to

^[stacn]*(s[cant]|[tacn]s)[scant]*$

This changes the result for STS to 1, while leaving the result for SS as 0 since it doesn't contain a character other than S.

Demo on dbfiddle

Upvotes: 3

C Perkins
C Perkins

Reputation: 3882

This assumes that s can appear multiple times just like the other allowed characters. The requirements do not explicitly exclude that possibility although the examples do not include such an example. I do not interpret "One letter for sure" to mean "appears only once".

^(s+[tacn][stacn]*|[tacn]+s[stacn]*)$

Upvotes: 2

Related Questions