Grab abbreviations based on the number of capitalized characters preceding acronym

Question

I have a program that looks for acronyms in a paragraph and defines them based preceding words from the number of characters in the acronym. However, for acronyms that have words like "in" and "and" that aren't part of the acronym, my code has problems. Basically, I only want it to count preceding text if the word starts with a capital letter.

import re

s = "Too many people, but not All Awesome Dudes (AAD) only care about the  Initiative on Methods, Measurement, and Pain Assessment in Clinical  Trials (IMMPACT)."
allabbre = []

for match in re.finditer(r"$(.*?)$", s):
 start_index = match.start()
 abbr = match.group(1)
 size = len(abbr)
 words = s[:start_index].split()[-size:]
 definition = " ".join(words)
 abbr_keywords = definition + " " + "(" + abbr + "}"
 pattern = '[A-Z]'

 if re.search(pattern, abbr):
     if abbr_keywords not in allabbre:
        allabbre.append(abbr_keywords)
     print(abbr_keywords)

Current Output:

All Awesome Dudes (AAD}
Measurement, and Pain Assessment in Clinical Trials (IMMPACT}

**Desired Output:**
```none
All Awesome Dudes (AAD}
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)

Smart Manoj · Accepted Answer

import re

s = "Too many people, but not All Awesome Dudes (AAD) only care about the  Initiative on Methods, Measurement, and Pain Assessment in Clinical  Trials (IMMPACT)."
allabbre = []

for match in re.finditer(r"$(.*?)$", s):
 start_index = match.start()
 abbr = match.group(1)
 size = len(abbr)
 words = s[:start_index].split()
 count=0
 for k,i in enumerate(words[::-1]):
  if i[0].isupper():count+=1
  if count==size:break
 words=words[-k-1:] 
 definition = " ".join(words)
 abbr_keywords = definition + " " + "(" + abbr + ")"
 pattern='[A-Z]'

 if re.search(pattern, abbr):
     if abbr_keywords not in allabbre:
        allabbre.append(abbr_keywords)
     print(abbr_keywords)

Output:

All Awesome Dudes (AAD)

Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)

Grab abbreviations based on the number of capitalized characters preceding acronym

Answers (2)

Related Questions