zeroes_ones
zeroes_ones

Reputation: 191

Pandas: how to filter out rows containing a string pattern within a list in a column?

I have a data frame that looks similar to the following:

df = pd.DataFrame({
    'employee_id' : [123, 456, 789],
    'country_code' : ['US', 'CAN', 'MEX'],
    'comments' : (['good performer', 'due for raise', 'should be promoted'],
                 ['bad performer', 'should be fired', 'speak to HR'],
                 ['recently hired', 'needs training', 'shows promise'])
})

df

    employee_id   country_code   comments
0   123           US             [good performer, due for raise, should be promoted]
1   456           CAN            [bad performer, should be fired, speak to HR]
2   789           MEX            [recently hired, needs training, shows promise]

I would like to be able to filter the comments column to remove any rows containing the string 'performer'. To do so, I'm using:

df = df[~df['comments'].str.contains('performer')]

But, this returns an error:

TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Thanks in advance for any assistance you can give!

Upvotes: 1

Views: 1481

Answers (3)

G.G
G.G

Reputation: 765

df = df[~df['comments'].map(' '.join).str.contains('performer')]

Upvotes: 0

mozway
mozway

Reputation: 262594

As you have lists in your Series, you cannot vectorize. You can use a list comprehension:

df2 = df[[all('performer' not in x for x in l)
          for l in df['comments']]]

Output:

   employee_id country_code                                         comments
2          789          MEX  [recently hired, needs training, shows promise]

Upvotes: 0

ArchAngelPwn
ArchAngelPwn

Reputation: 3046

if IIUC You need to break the comments column down into a string instead of a list

df = pd.DataFrame({
    'employee_id' : [123, 456, 789],
    'country_code' : ['US', 'CAN', 'MEX'],
    'comments' : (['good performer', 'due for raise', 'should be promoted'],
                 ['bad performer', 'should be fired', 'speak to HR'],
                 ['recently hired', 'needs training', 'shows promise'])
})
df['comments'] = df['comments'].apply(lambda x : ' '.join(x))
df = df[~df['comments'].str.contains('performer')]
df

Upvotes: 1

Related Questions