Catherine
Catherine

Reputation: 119

Filtering for rows in a Pandas dataframe containing at least one zero

I am trying to delete all rows in a Pandas data frame that don't have a zero in either of two columns. My data frame is indexed from 0 to 620. This is my code:

for index in range(0, 621):
    if((zeroes[index,1] != 0) and (zeroes[index,3] != 0)):
        del(zeroes[index,])

I keep getting a key error. KeyError: (0, 1)

My instructor suggested I change the range to test to see if I have bad lines in my data frame. I did. I checked the tail of my dataframe and then changed the range to (616, 621). Then I got the key error: (616, 1).

Does anyone know what is wrong with my code or why I am getting a key error?

This code also produces a key error of (0,1):

index = 0
while (index < 621):
    if((zeroes[index,1] != 0) and (zeroes[index,3] != 0)):
        del(zeroes[index,])
index = index + 1

Upvotes: 1

Views: 1863

Answers (1)

jpp
jpp

Reputation: 164773

Don't use a manual for loop here. Your error probably occurs because df.__getitem__((x, y)), which is effectively what df[x, y] calls, has no significance.

Instead, use vectorised operations and Boolean indexing. For example, to remove rows where either column 1 or 3 do not equal 0:

df = df[df.iloc[:, [1, 3]].eq(0).any(1)]

This works because eq(0) creates a dataframe of Boolean values indicating equality to zero and any(1) filters for rows with any True values.

The full form is df.iloc[:, [1, 3]].eq(0).any(axis=1), or df.iloc[:, [1, 3]].eq(0).any(axis='columns') for even more clarity. See the docs for pd.DataFrame.any for more details.

Upvotes: 1

Related Questions