Reputation: 519
I have a program that looks for acronyms in a paragraph and defines them based preceding words from the number of characters in the acronym. However, for acronyms that have words like "in"
and "and"
that aren't part of the acronym, my code has problems. Basically, I only want it to count preceding text if the word starts with a capital letter.
import re
s = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()[-size:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + "}"
pattern = '[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
Current Output:
All Awesome Dudes (AAD}
Measurement, and Pain Assessment in Clinical Trials (IMMPACT}
**Desired Output:**
```none
All Awesome Dudes (AAD}
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
Upvotes: 0
Views: 127
Reputation: 195553
My take on the problem:
txt = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
import re
from itertools import groupby
t = [list(g) if v else list(g)[::-1] for v, g in groupby(re.findall(r'\b[A-Z].+?\b', txt)[::-1], lambda k: k.upper() == k)]
for a, b in zip(t[::2], t[1::2]):
abbr, meaning = a[0], b[len(b) - len(a[0]):len(b) - len(a[0]) + len(a[0])]
if all(c1 == c2[0] for c1, c2 in zip(abbr, meaning)):
print(' '.join(meaning),'(' + abbr + ')')
Prints:
Initiative Methods Measurement Pain Assessment Clinical Trials (IMMPACT)
All Awesome Dudes (AAD)
Upvotes: 1
Reputation: 5843
import re
s = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()
count=0
for k,i in enumerate(words[::-1]):
if i[0].isupper():count+=1
if count==size:break
words=words[-k-1:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + ")"
pattern='[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
Output:
All Awesome Dudes (AAD)
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
Upvotes: 1