jerof
jerof

Reputation: 73

Regex - choosing only separate words in a string

I'm new to Regex and I'd like to perform the following operation in Pandas:

index string

0 foobright foo barber baz bare

1 foo bar barret bazar

I'd like to remove all occurrences of foo, bar, baz only if they are separate words in the DataFrame df.

The Output I'm looking for is a DataFrame out:

index string

0 foobright barber bare

1 barret bazar

I cannot figure out the regex to perform this operation.

Can anyone help me out?

Thank you

Upvotes: 1

Views: 46

Answers (2)

jezrael
jezrael

Reputation: 863751

Regex here is not necessary, only split values by whitespaces, filter by Series and join back in generator with join:

s = pd.Series(['foo','bar','baz'])
df['string'] = [' '.join(x for x in a.split() if x not in s.tolist()) for a in df['string']]

print (df)
                  string
0  foobright barber bare
1           barret bazar

Or use lambda function:

s = pd.Series(['foo','bar','baz'])
f = lambda a: ' '.join(x for x in a.split() if x not in s.tolist())
df['string'] = df['string'].apply(f)

print (df)
                  string
0  foobright barber bare
1           barret bazar

Upvotes: 4

farooq
farooq

Reputation: 190

in Notepad++ find with following regular expression

[$\s]foo|bar|baz[\s$]

Upvotes: 0

Related Questions