Reputation: 23
I am using https://pythex.org/ to test the following:
I want to search the test string below, select all consecutive characters that have exactly three capital letters.
Test string: bcAc BBGFeQFFJaaBx EBBcDDD
[A-Z]{3}
returns all groups of 3 uppercase characters and even those with 4 inside. I have tried to force only 3 capitals using ^[A-Z]{3}$
but this does not work either.
In the above string I only want EBB
and DDD
to be matched.
Upvotes: 2
Views: 5879
Reputation: 4644
If you want only three capital letters to be matched (such as EBB
or DDD
) to be matched, then think about which character comes before or after the group of 3 capital letters.
[^A-Z]
means "exactly one character which is not a capital letter."
Thus the following two things are equivalent:
Regex | English |
---|---|
[^A-Z][A-Z]{3}[^A-Z] |
(1) one thing which is not a capital letter (2) three capital letters (3) one thing which is not a capital letter |
I apologize for the formatting of the numbered list inside of the table. stackoverflow supports numbered lists outside of tables, but not inside of tables. Also, I cannot insert manual line-break inside of a stackoverflow table.
Anyway, our regular expression is still not ideal.
GENERAL PROBLEM | EXAMPLE OF PROBLEM |
---|---|
The non-capital letters are captured. | If you search inside of the string xxxxxbAAAbxxxxx then output might contain bAAAb instead of AAA |
The begining of a string and end of string pose a problem | If you search inside of AAAxxxxBBBxxx we will find BBB but not AAA |
One solution is the following:
(?:^|[^A-Z])([A-Z]{3})(?:[^A-Z]|$)
Consider the text John [email protected]
...
).John Doe
and 303-159-8712
etc...Regex | English |
---|---|
| |
The vertical pipe character is similar to the English word "or |
^ |
beginning (left end) of the text |
$ |
end of the text (right-most end of the string) |
[A-Z] |
One capital letter from A through Z |
[^A-Z] |
One character other than a letter from A through Z |
[^A-Z]|$ |
One character other than a letter from A through Z or the end of the text |
^|[^A-Z] |
beginning of string |
(?:[^A-Z]|$) |
The question-mark-colon (?: ) says that this is a chunk of text we want to delete/ignore. The chunk of text contains only one thing. This only thing is either the end of the line or one character other than a capital letter. |
You can test it here
Upvotes: 0
Reputation: 174
Not sure if this is the best solution, but I used negative lookbehinds and lookaheads to make sure that there were no capital letters in front of or behind the three-letter bits.
(?<![A-Z])[A-Z]{3}(?![A-Z])
Upvotes: 0