Delete rows from pandas DataFrame with non-unique index

Question

I am looking for a way to delete rows in a pandas DataFrame when the index is not guaranteed to be unique.

So, I want to drop items 0 and 4 from my DataFrame df. This would be the typical code you would use to do that:

df.drop([0, 4].index)

If each index is unique, this works fine. However, if items 0, 1, and 2 all have the same index, this code drops items 0, 1, 2, and 4, instead of just 0 and 4.

My DataFrame is set up this way for good reasons, so I don't want to restructure my data, which looks approximately like this:

        age
site             
mc03    0.39
mc03    0.348
mc03    0.348
mc03    0.42
mc04    0.78

I tried:

del df.iloc[0]

but this fails with:

AttributeError: __delitem__

Any other suggestions for how to accomplish this task?

Update:

I found two ways to do it, but neither is particularly elegant.

to_drop = [0, 4]
df = df.iloc[sorted(set(range(len(df))) - set(to_drop))]
# or:
df = df.iloc[[i for i in range(len(df)) if i not in to_drop]]

Maybe this is as good as it's going to get, though?

user2285236 · Accepted Answer

This is not very elegant too, but let me post it as an alternative:

df = df.reset_index().drop([0, 4]).set_index("site")

It temporarily changes the index to a regular index, drops the rows and sets the original index back. The idea is from this answer.

Delete rows from pandas DataFrame with non-unique index

Answers (2)

Related Questions