ywkim
ywkim

Reputation: 13

Extract only specific words inside parenthesis

I want to extract only specific words inside parenthesis. For example, if I had a word list ['foo', 'bar'] and a string "alpha bravo (charlie foo bar delta) foxtrot", I want to get "alpha bravo foo bar foxtrot" by the extraction. I've already tried but failed.

word_list = ['foo', 'bar']
string = 'alpha bravo (charlie foo bar delta) foxtrot'
print(re.sub(r"\([^()]*\b({})\b[^()]*\)".format('|'.join(word_list)), r'\1', string, flags = re.I))

I expected to get "alpha bravo foo bar foxtrot" but the result was "alpha bravo bar foxtrot". Would you like to tell me how to solve this problem?

Upvotes: 1

Views: 136

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521289

Here is a regex based approach using re.sub with callback logic:

word_list = ['foo', 'bar']
regex = r'\b(?:' + '|'.join(word_list) + r')\b'         # \b(?:foo|bar)\b
string = 'alpha bravo (charlie foo bar delta) foxtrot'
def repl(m):
    if m.group(1):
        return ' '.join(re.findall(regex, m.group(1)))
    else:
        return m.group(0)

print(re.sub(r'\((.*?)\)|\w+', repl, string))

This prints:

alpha bravo foo bar foxtrot

For an explanation, we do a global regex search on the following pattern:

\((.*?)\)|\w+

This will attempt to match, first, any terms in parentheses. If it finds such a match, it will then pass the entire match to the callback function repl(). This callback function will use re.findall on your list of words to retain only the matches you want from the parentheses. Otherwise, the above regex will just find one word at a time.

Upvotes: 1

Buddy Bob
Buddy Bob

Reputation: 5889

Here is my homemade recipe

import re
word_list = ['foo', 'bar']
string = 'alpha bravo (charlie foo bar delta) foxtrot'
string = re.split('\(|\)',string)
text = [string[0],string[2]]
count = 0
for elements in string[1].split():
    if elements in word_list:
        count += 1
        text.insert(count,elements+' ')
print(''.join(text))          

Upvotes: 0

Related Questions