tenebris silentio
tenebris silentio

Reputation: 519

Grab abbreviations based on the number of capitalized characters preceding acronym

I have a program that looks for acronyms in a paragraph and defines them based preceding words from the number of characters in the acronym. However, for acronyms that have words like "in" and "and" that aren't part of the acronym, my code has problems. Basically, I only want it to count preceding text if the word starts with a capital letter.

import re

s = "Too many people, but not All Awesome Dudes (AAD) only care about the  Initiative on Methods, Measurement, and Pain Assessment in Clinical  Trials (IMMPACT)."
allabbre = []

for match in re.finditer(r"\((.*?)\)", s):
 start_index = match.start()
 abbr = match.group(1)
 size = len(abbr)
 words = s[:start_index].split()[-size:]
 definition = " ".join(words)
 abbr_keywords = definition + " " + "(" + abbr + "}"
 pattern = '[A-Z]'

 if re.search(pattern, abbr):
     if abbr_keywords not in allabbre:
        allabbre.append(abbr_keywords)
     print(abbr_keywords)

Current Output:

All Awesome Dudes (AAD}
Measurement, and Pain Assessment in Clinical Trials (IMMPACT}

**Desired Output:**
```none
All Awesome Dudes (AAD}
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)

Upvotes: 0

Views: 127

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195553

My take on the problem:

txt = "Too many people, but not All Awesome Dudes (AAD) only care about the  Initiative on Methods, Measurement, and Pain Assessment in Clinical  Trials (IMMPACT)."

import re
from itertools import groupby

t = [list(g) if v else list(g)[::-1] for v, g in groupby(re.findall(r'\b[A-Z].+?\b', txt)[::-1], lambda k: k.upper() == k)]
for a, b in zip(t[::2], t[1::2]):
    abbr, meaning = a[0], b[len(b) - len(a[0]):len(b) - len(a[0]) + len(a[0])]
    if all(c1 == c2[0] for c1, c2 in zip(abbr, meaning)):
        print(' '.join(meaning),'(' + abbr + ')')

Prints:

Initiative Methods Measurement Pain Assessment Clinical Trials (IMMPACT)
All Awesome Dudes (AAD)

Upvotes: 1

Smart Manoj
Smart Manoj

Reputation: 5843

import re

s = "Too many people, but not All Awesome Dudes (AAD) only care about the  Initiative on Methods, Measurement, and Pain Assessment in Clinical  Trials (IMMPACT)."
allabbre = []

for match in re.finditer(r"\((.*?)\)", s):
 start_index = match.start()
 abbr = match.group(1)
 size = len(abbr)
 words = s[:start_index].split()
 count=0
 for k,i in enumerate(words[::-1]):
  if i[0].isupper():count+=1
  if count==size:break
 words=words[-k-1:] 
 definition = " ".join(words)
 abbr_keywords = definition + " " + "(" + abbr + ")"
 pattern='[A-Z]'

 if re.search(pattern, abbr):
     if abbr_keywords not in allabbre:
        allabbre.append(abbr_keywords)
     print(abbr_keywords)

Output:

All Awesome Dudes (AAD)

Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)

Upvotes: 1

Related Questions