Reputation: 13
I want to extract only specific words inside parenthesis. For example, if I had a word list ['foo', 'bar'] and a string "alpha bravo (charlie foo bar delta) foxtrot", I want to get "alpha bravo foo bar foxtrot" by the extraction. I've already tried but failed.
word_list = ['foo', 'bar']
string = 'alpha bravo (charlie foo bar delta) foxtrot'
print(re.sub(r"\([^()]*\b({})\b[^()]*\)".format('|'.join(word_list)), r'\1', string, flags = re.I))
I expected to get "alpha bravo foo bar foxtrot" but the result was "alpha bravo bar foxtrot". Would you like to tell me how to solve this problem?
Upvotes: 1
Views: 136
Reputation: 521289
Here is a regex based approach using re.sub
with callback logic:
word_list = ['foo', 'bar']
regex = r'\b(?:' + '|'.join(word_list) + r')\b' # \b(?:foo|bar)\b
string = 'alpha bravo (charlie foo bar delta) foxtrot'
def repl(m):
if m.group(1):
return ' '.join(re.findall(regex, m.group(1)))
else:
return m.group(0)
print(re.sub(r'\((.*?)\)|\w+', repl, string))
This prints:
alpha bravo foo bar foxtrot
For an explanation, we do a global regex search on the following pattern:
\((.*?)\)|\w+
This will attempt to match, first, any terms in parentheses. If it finds such a match, it will then pass the entire match to the callback function repl()
. This callback function will use re.findall
on your list of words to retain only the matches you want from the parentheses. Otherwise, the above regex will just find one word at a time.
Upvotes: 1
Reputation: 5889
Here is my homemade recipe
import re
word_list = ['foo', 'bar']
string = 'alpha bravo (charlie foo bar delta) foxtrot'
string = re.split('\(|\)',string)
text = [string[0],string[2]]
count = 0
for elements in string[1].split():
if elements in word_list:
count += 1
text.insert(count,elements+' ')
print(''.join(text))
Upvotes: 0