satoshi
satoshi

Reputation: 439

Deleting data in pandas given a string condition

I am having trouble understanding the mechanics here given the following.

I have a dataframe reading from a .csv :

  a1 b1 c1
1 aa bb cc
2 ab ba ca 

df.drop(df['a1'].str.contains('aa',case = False))

I want to drop all the rows in column a1 that contain 'aa'

I believe to have attempted everything on here but still get the :

ValueError: labels [False False False ... False False False] not contained in axis

Yes, I have also tried

skipinitialspace=True
axis=1

Any help would be appreciated, thank you.

Upvotes: 2

Views: 120

Answers (1)

cs95
cs95

Reputation: 403128

str.contains returns a mask:

df['a1'].str.contains('aa',case = False)

1     True
2    False
Name: a1, dtype: bool

However, drop accepts index labels, not boolean masks. If you open up the help on drop, you may observe this first-hand:

?df.drop

Signature: df.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Docstring:
Return new object with labels in requested axis removed.

Parameters
----------
labels : single label or list-like
    Index or column labels to drop.

You could figure out the index labels from the mask and pass those to drop

idx = df.index[df['a1'].str.contains('aa')]
df.drop(idx)

   a1  b1  c1
2  ab  ba  ca

However, this is too windy, so I'd recommend just sticking to the pandaic method of dropping rows based on conditions, boolean indexing:

df[~df['a1'].str.contains('aa')]

   a1  b1  c1
2  ab  ba  ca

If anyone is interested in removing rows that contain strings in a list

df = df[~df['a1'].str.contains('|'.join(my_list))]

Make sure to strip white spaces. Credit to https://stackoverflow.com/a/45681254/9500464

Upvotes: 6

Related Questions