Correct Regex for Acronyms In Python

Question

I want to find so called Acronyms in text is this the correct way of defining the regex for it? My idea is that if something starts with capital and ends with capital letter it is acronym. Is this correct?

import re
test_string = "Department of Something is called DOS, 
or DoS,  or (DiS) or D.O.S. in United State of America, U.S.A./ USA"
pattern3=r'([A-Z][a-zA-Z]*[A-Z]|(?:[A-Z]\.)+)'
print re.findall(pattern3, test_string)

and the out put is:

['DOS', 'DoS', 'DiS', 'D.O.S.', 'U.S.A.', 'USA']

Greg · Accepted Answer

Think that you can use the word boundary \b anchor for what you want to do

>>> regex = r"\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?"
>>> re.findall(regex, "AbIA AoP U.S.A.")
['AbIA', 'AoP', 'U.S.A.']

Correct Regex for Acronyms In Python

Answers (1)

Related Questions