Reputation: 3667
I have a dataframe df
Name dist
aaaa 10
bbbb 11
cccc 41
dddd 77
I want to delete rows which have dist less than 10 to the next row. The expexted output is
Name dist
aaaa 10
cccc 41
dddd 77
To do this I used the following code
>>> for idx,row in df.iterrows():
... if idx < df.shape[0]-1:
... if ((df.ix[idx+1,'dist_to_TSS']-df.ix[idx+1,'dist_to_TSS'])<10):
... df.drop(row)
...
But I get errors. Can you help?
Upvotes: 4
Views: 8966
Reputation: 1505
If your criteria for deciding which rows you want to drop is a little trickier, e.g. relating to values in the previous/next row then an easy way is to simply build up a list of indexes of rows that you want to delete and then delete them all in one go at the end. e.g.
indexes_to_drop = []
for i in df.index:
....
if {make your decision here}:
indexes_to_drop.append(i)
....
df.drop(df.index[indexes_to_drop], inplace=True )
Upvotes: 8
Reputation: 210912
IIUC you can do it this way:
In [110]: df = df[df.dist.diff().fillna(100) >= 10]
In [111]: df
Out[111]:
Name dist
0 aaaa 10
2 cccc 41
3 dddd 77
Explanation:
In [100]: df.dist.diff()
Out[100]:
0 NaN
1 1.0
2 30.0
3 36.0
Name: dist, dtype: float64
In [101]: df.dist.diff().fillna(100)
Out[101]:
0 100.0
1 1.0
2 30.0
3 36.0
Name: dist, dtype: float64
In [102]: df.dist.diff().fillna(100) >= 10
Out[102]:
0 True
1 False
2 True
3 True
Name: dist, dtype: bool
Upvotes: 1