Reputation: 73
I'm new to Regex and I'd like to perform the following operation in Pandas:
s
, I have the following words foo
, bar
, baz
.df
I haveindex string
0 foobright foo barber baz bare
1 foo bar barret bazar
I'd like to remove all occurrences of foo
, bar
, baz
only if they are separate words in the DataFrame df
.
The Output I'm looking for is a DataFrame out
:
index string
0 foobright barber bare
1 barret bazar
I cannot figure out the regex to perform this operation.
Can anyone help me out?
Thank you
Upvotes: 1
Views: 46
Reputation: 863751
Regex here is not necessary, only split values by whitespaces, filter by Series
and join back in generator with join
:
s = pd.Series(['foo','bar','baz'])
df['string'] = [' '.join(x for x in a.split() if x not in s.tolist()) for a in df['string']]
print (df)
string
0 foobright barber bare
1 barret bazar
Or use lambda function:
s = pd.Series(['foo','bar','baz'])
f = lambda a: ' '.join(x for x in a.split() if x not in s.tolist())
df['string'] = df['string'].apply(f)
print (df)
string
0 foobright barber bare
1 barret bazar
Upvotes: 4
Reputation: 190
in Notepad++ find with following regular expression
[$\s]foo|bar|baz[\s$]
Upvotes: 0