Ricardo Milhomem
Ricardo Milhomem

Reputation: 169

Drop rows containing string Pandas

I am trying to remove rows with an specific string only on a column, in a dataframe.

I tought of using a combination of drop and iloc methods, because the column names are rather large and mutable and I am not interested in referencing the columns by name, but I am not being able to combine those two into a function containing the string parameter.

As an example, let's say I have the following dataframe:

    Nome    Nota
0   a   1.000000
1   b   1.250000
2   c   1.375000
3   d   1.437500
4   e   1.468750
5   f   1.484375
6   g   1.492188
7   h   1.496094
8   i   1.498047
9   j   1.499023
10  k   1.499512
11  l   1.499756
12  m   1.499878
13  n   1.499939
14  o   1.499969
15  p   1.499985
16  q   1.499992
17  r   1.499996
18  s   1.499998

Let's say I would like to drop every row containing the 'm' string on the first column. I tried using the function:

testdf.drop(testdf.columns[0] == 'm',inplace = True)

but it gave me the error message:

'KeyError: '[False] not found in axis'.

What am I getting wrong here?

Upvotes: 1

Views: 854

Answers (4)

danPho
danPho

Reputation: 87

You could specify a filter like this:

filter = df['Nome'] != 'm'

This will output an array of Boolean, note that the index 12 is False

0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12    False
13     True
14     True
15     True
16     True
17     True
18     True
Name: Nome, dtype: bool

After that apply the filter to the dataframe, and index 12 will be removed

df = df[filter]
print(df)

   Nome      Nota
0     a  1.000000
1     b  1.250000
2     c  1.375000
3     d  1.437500
4     e  1.468750
5     f  1.484375
6     g  1.492188
7     h  1.496094
8     i  1.498047
9     j  1.499023
10    k  1.499512
11    l  1.499756
13    n  1.499939
14    o  1.499969
15    p  1.499985
16    q  1.499992
17    r  1.499996
18    s  1.499998

Upvotes: 1

AfterFray
AfterFray

Reputation: 1851

Try this :

import pandas as pd
df = pd.DataFrame({'Nome' : ['a','m','c','m'],
                   'Nota' : [1.0, 1.1, 1.2, 1.3]})

df.loc[df['Nome'] != 'm'].reset_index(drop = True)

Upvotes: 0

Anthony
Anthony

Reputation: 1

In this case, testdf.columns[0] == "m" is returning a list of truth values that correspond to whether or not each row in column 0 is equal to "m". What you want to do instead is use this list of truth values as an index into the DataFrame. You can do so using this line of code.

testdf = testdf[testdf["Nome"] == "m"]

Hope this helps.

Upvotes: 0

SomeDude
SomeDude

Reputation: 14228

Use Boolean indexing

first_col = testdf.columns[0]; 
testdf = testdf[~(testdf[first_col]=='m')]

Upvotes: 0

Related Questions