Reputation: 3190
I am looking for a way to delete rows in a pandas DataFrame when the index is not guaranteed to be unique.
So, I want to drop items 0 and 4 from my DataFrame df. This would be the typical code you would use to do that:
df.drop([0, 4].index)
If each index is unique, this works fine. However, if items 0, 1, and 2 all have the same index, this code drops items 0, 1, 2, and 4, instead of just 0 and 4.
My DataFrame is set up this way for good reasons, so I don't want to restructure my data, which looks approximately like this:
age
site
mc03 0.39
mc03 0.348
mc03 0.348
mc03 0.42
mc04 0.78
I tried:
del df.iloc[0]
but this fails with:
AttributeError: __delitem__
Any other suggestions for how to accomplish this task?
Update:
I found two ways to do it, but neither is particularly elegant.
to_drop = [0, 4]
df = df.iloc[sorted(set(range(len(df))) - set(to_drop))]
# or:
df = df.iloc[[i for i in range(len(df)) if i not in to_drop]]
Maybe this is as good as it's going to get, though?
Upvotes: 2
Views: 1930
Reputation: 210852
alternative solution (using numpy):
In [252]: mask = np.ones(len(df)).astype(bool)
In [253]: mask[[0,4]] = False
In [254]: mask
Out[254]: array([False, True, True, True, False], dtype=bool)
In [255]: df[mask]
Out[255]:
age
mc03 0.348
mc03 0.348
mc03 0.420
Upvotes: 0
Reputation:
This is not very elegant too, but let me post it as an alternative:
df = df.reset_index().drop([0, 4]).set_index("site")
It temporarily changes the index to a regular index, drops the rows and sets the original index back. The idea is from this answer.
Upvotes: 4