Removing a list of letter groupings and words from data-frame populated with sentences

Question

I have a dataframe df which contains uncleaned text strings

                             phrase
 0           the quick brown br fox
 1   jack and jill went up the hill

I also have a list of words and letter groupings that I'd like to remove called remove which looks like:

['br', and]

In this example I'd like the following output:

                         phrase
 0          the quick brown fox
 1   jack jill went up the hill

Note it's not the br in 'brown' remains in df as it's part of a larger word but the 'br' on its own is removed.

I've tried:

df['phrase']=[re.sub(r"\b%remove\b", "", sent) for sent in df['phrase']]

But can't get it to work correctly. What can I try next?

jezrael · Accepted Answer

Use nested list comprehension with split, tes membership by in and join splitted values back:

L = ['br', 'and']

df['phrase']=[' '.join(x for x in sent.split() if x not in L) for sent in df['phrase']]
print (df)
                       phrase
0         the quick brown fox
1  jack jill went up the hill

Removing a list of letter groupings and words from data-frame populated with sentences

Answers (2)

Related Questions