cnauber
cnauber

Reputation: 443

Regex, how to return a set of words present in a title?

In the example below, would the correct result be to return only the headings between id 9 and 15? the closest I got was:

((^| )(FED[A-Z]*|(ESC[A-Z]*)|(AGRO[A-Z]*)|(CENT[A-Z]*)|(CTR[A-Z]*)|(INST[A-Z]*)|(EDUC[A-Z]*)|(SCI[A-Z]*)|(TEC[A-Z]*))( |$))

Note: the FED [A-Z] * argument must always be present with the combination of at least one of the terms listed. Titles containing the term UNIV [A-Z] * must return false.

ID TEXT CURRENT RETURN EXPECTED RETURN
1 INST NACL MATEMAT PURA TRUE FALSE
2 PARA FED UNIV, TOXICOL LAB TRUE FALSE
3 CTR, BR12020330 TAUBATE TRUE FALSE
4 VICENTE LINHARES ST TRUE FALSE
5 ALBERT EINSTEIN AVE TRUE FALSE
6 REG TECN CRIMINALIST TRUE FALSE
7 NASCIMENTO CASTRO AVE TRUE FALSE
8 SAO PAULO FED UNIV, COLL AGR SCI, DEPT CROP SCI, BOTUCATU, SP, BRAZIL TRUE FALSE
9 INST FED CIENCIA TECNOL SUL MINAS, CAMPUS POCOS DE CALDAS TRUE TRUE
10 INST FED EDUC PERNAMBUCO IFPE, RECIFE TRUE TRUE
11 INST FED MINAS GERAIS, CAMPUS CONGONHAS TRUE TRUE
12 INST FED ECN TECNOL NORTE MINAS TRUE TRUE
13 CTR FED EDUC TECNOL OURO PRETO TRUE TRUE
14 FED INST AMPUS GARANHUNS TRUE TRUE
15 INST FED PERNAMBUCO BRASIL FALSE TRUE

More exemples: https://regex101.com/r/reVgya/1

Upvotes: 1

Views: 90

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627609

You can use

WHERE col RLIKE '^(?!.*\\bUNIV\\b)(?=.*\\bFED\\b).*\\b(FED|ESC|AGRO|CIE|CENT|CTR|CRT|INST|EDUC|SCI|TEC)\\b'

Here is the regex demo.

Details:

  • ^ - start of string
  • (?!.*\bUNIV\b) - a negative lookahead check: there must be NO UNIV whole word in the string
  • (?=.*\bFED\b) - a positive lookahead check: there must be a FED whole word in the string
  • .*\b(FED|ESC|AGRO|CIE|CENT|CTR|CRT|INST|EDUC|SCI|TEC)\b - after any zero or more chars other than line break chars, there must be a whole word from the list of alternations.

Upvotes: 1

Related Questions