Rebin
Rebin

Reputation: 526

Correct Regex for Acronyms In Python

I want to find so called Acronyms in text is this the correct way of defining the regex for it? My idea is that if something starts with capital and ends with capital letter it is acronym. Is this correct?

import re
test_string = "Department of Something is called DOS, 
or DoS,  or (DiS) or D.O.S. in United State of America, U.S.A./ USA"
pattern3=r'([A-Z][a-zA-Z]*[A-Z]|(?:[A-Z]\.)+)'
print re.findall(pattern3, test_string)

and the out put is:

['DOS', 'DoS', 'DiS', 'D.O.S.', 'U.S.A.', 'USA']

Upvotes: 0

Views: 2187

Answers (1)

Greg
Greg

Reputation: 5588

Think that you can use the word boundary \b anchor for what you want to do

>>> regex = r"\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?"
>>> re.findall(regex, "AbIA AoP U.S.A.")
['AbIA', 'AoP', 'U.S.A.']

Upvotes: 2

Related Questions