str.contains only and exact value

Question

I have the following list :

personnages = ['Stanley','Kevin', 'Franck']

I want to use str.contains function to create a new pandas dataframe df3 :

df3 = df2[df2['speaker'].str.contains('|'.join(personnages))]

However, if the row of the column speaker contains : 'Stanley & Kevin', i don't want it in df3.

How can I improve my code to do this ?

C.Nivs · Accepted Answer

You'll want to denote line start and end in your regex, that way it only contains the single name:

import pandas as pd

speakers = ['Stanley', 'Kevin', 'Frank', 'Kevin & Frank']
df = pd.DataFrame([{'speaker': speaker} for speaker in speakers])
         speaker
0        Stanley
1          Kevin
2          Frank
3  Kevin & Frank


r = '|'.join(speakers[:-1]) # gets all but the last one for the sake of example

# the ^ marks start of string, and $ is the end
df[df['speaker'].str.contains(f'^({r})$')]
   speaker
0  Stanley
1    Kevin
2    Frank

str.contains only and exact value

Answers (2)

Related Questions