Ssank
Ssank

Reputation: 3667

dropping a row while iterating through pandas dataframe

I have a dataframe df

Name    dist
aaaa     10
bbbb     11
cccc     41
dddd     77

I want to delete rows which have dist less than 10 to the next row. The expexted output is

Name    dist
aaaa     10
cccc     41
dddd     77

To do this I used the following code

>>> for idx,row in df.iterrows():
...     if idx < df.shape[0]-1:
...             if ((df.ix[idx+1,'dist_to_TSS']-df.ix[idx+1,'dist_to_TSS'])<10):
...                     df.drop(row)
... 

But I get errors. Can you help?

Upvotes: 4

Views: 8966

Answers (2)

jacanterbury
jacanterbury

Reputation: 1505

If your criteria for deciding which rows you want to drop is a little trickier, e.g. relating to values in the previous/next row then an easy way is to simply build up a list of indexes of rows that you want to delete and then delete them all in one go at the end. e.g.

indexes_to_drop = []

for i in df.index:
    ....
    if {make your decision here}:
        indexes_to_drop.append(i)
    ....

df.drop(df.index[indexes_to_drop], inplace=True )

Upvotes: 8

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210912

IIUC you can do it this way:

In [110]: df = df[df.dist.diff().fillna(100) >= 10]

In [111]: df
Out[111]:
   Name  dist
0  aaaa    10
2  cccc    41
3  dddd    77

Explanation:

In [100]: df.dist.diff()
Out[100]:
0     NaN
1     1.0
2    30.0
3    36.0
Name: dist, dtype: float64

In [101]: df.dist.diff().fillna(100)
Out[101]:
0    100.0
1      1.0
2     30.0
3     36.0
Name: dist, dtype: float64

In [102]: df.dist.diff().fillna(100) >= 10
Out[102]:
0     True
1    False
2     True
3     True
Name: dist, dtype: bool

Upvotes: 1

Related Questions