A Basel
A Basel

Reputation: 23

Regex: searching for exactly 3 capital letters

I am using https://pythex.org/ to test the following:

I want to search the test string below, select all consecutive characters that have exactly three capital letters.

Test string: bcAc BBGFeQFFJaaBx EBBcDDD

[A-Z]{3} returns all groups of 3 uppercase characters and even those with 4 inside. I have tried to force only 3 capitals using ^[A-Z]{3}$ but this does not work either.

In the above string I only want EBB and DDD to be matched.

Upvotes: 2

Views: 5879

Answers (3)

Toothpick Anemone
Toothpick Anemone

Reputation: 4644

If you want only three capital letters to be matched (such as EBB or DDD) to be matched, then think about which character comes before or after the group of 3 capital letters.

[^A-Z] means "exactly one character which is not a capital letter."

Thus the following two things are equivalent:

Regex English
[^A-Z][A-Z]{3}[^A-Z] (1) one thing which is not a capital letter   (2) three capital letters  (3) one thing which is not a capital letter

I apologize for the formatting of the numbered list inside of the table. stackoverflow supports numbered lists outside of tables, but not inside of tables. Also, I cannot insert manual line-break inside of a stackoverflow table.

Anyway, our regular expression is still not ideal.

GENERAL PROBLEM EXAMPLE OF PROBLEM
The non-capital letters are captured. If you search inside of the string xxxxxbAAAbxxxxx then output might contain bAAAb instead of AAA
The begining of a string and end of string pose a problem If you search inside of AAAxxxxBBBxxx we will find BBB but not AAA

One solution is the following:

(?:^|[^A-Z])([A-Z]{3})(?:[^A-Z]|$)

Consider the text John [email protected]

  • We want to delete the dots (...).
  • We want to keep John Doe and 303-159-8712 etc...

What are capturing groups?

  • A caputuring group labels text as "keep me"
  • A non-capturing group marks text as "delete me"
Regex English
| The vertical pipe character is similar to the English word "or
^ beginning (left end) of the text
$ end of the text (right-most end of the string)
[A-Z] One capital letter from A through Z
[^A-Z] One character other than a letter from A through Z
[^A-Z]|$ One character other than a letter from A through Z or the end of the text
^|[^A-Z] beginning of string
(?:[^A-Z]|$) The question-mark-colon (?:) says that this is a chunk of text we want to delete/ignore. The chunk of text contains only one thing. This only thing is either the end of the line or one character other than a capital letter.

You can test it here

Upvotes: 0

garroad_ran
garroad_ran

Reputation: 174

Not sure if this is the best solution, but I used negative lookbehinds and lookaheads to make sure that there were no capital letters in front of or behind the three-letter bits.

(?<![A-Z])[A-Z]{3}(?![A-Z])

Upvotes: 0

revo
revo

Reputation: 48711

Use negative lookarounds:

[A-Z]{3}(?<![A-Z]{4})(?![A-Z])

Live demo

Upvotes: 4

Related Questions