Reputation: 5566
I am trying to develop a regex to match acronyms inside parentheses.
regex = r"\(\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?\)"
This is what I have so far, it works for almost but not all cases. The case where an acronym is on its own line (not preceded or proceeded by any other characters) is also getting matched by this regex even if it is not surrounded by parentheses.
input it should match:
Any acronym in parentheses IE:
(ADF)
Input it shouldn't match but it is:
An acronym on it's own line IE:
ADF
Any idea what I have done wrong?
Upvotes: 1
Views: 353
Reputation: 2348
here's what I have so far. I'm using an "extension notation" to capture the acronym. The pattern is basically r"\([A-Z]\w*[A-Z]\)"
otherwise.
tst1="this is (ADF) a test"
tst2 = "This is is ADF a test, too."
# Newline tests
tst3 = "\n(ADF)\n"
tst4 = "\nADF\n"
# Upper/lowercase test.
tst5 = "This is (AdF) a test."
tst6 = "This is (Adf) a test."
def retest(testcase):
res = re.search(r"(?P<acronym>\([A-Z]\w*[A-Z]\))", testcase)
if res:
print(res.group("acronym"))
retest(tst1) # (ADF)
retest(tst2) # None
retest(tst3) # (ADF)
retest(tst4) # None
retest(tst5) # (ADF)
retest(tst6) # None
Upvotes: 1