Dropping value in a dataframe in a loop

Question

I have a dataframe with sorted values:

import numpy as np
import pandas as pd

sub_run = pd.DataFrame({'Runoff':[45,10,5,26,30,23,35], 'ind':[3, 10, 25,43,53,60,93]})

I would like to start from the highest value in Runoff (45), drop all values with which the difference in "ind" is less than 30 (10, 5), reupdate the DataFrame , then go to the second highest value (35): drop the indices with which the difference in "ind" is < 30 , then the the third highest value (30) and drop 26 and 23... I wrote the following code :

pre_ind = []

for (idx1, row1) in sub_run.iterrows():
     var = row1.ind
     pre_ind.append(np.array(var))
     for (idx2,row2) in sub_run.iterrows():
         if (row2.ind != var) and (row2.ind not in pre_ind):
            test = abs(row2.ind - var)
            print("test" , test)
            if test <= 30:
                 sub_run = sub_run.drop(sub_run[sub_run.ind == row2.ind].index)

I expect to find as an output the values [45,35,30]. However I only find the first one.

Many thanks

bpfrd · Accepted Answer

Try this:

list_pre_max = []
while True:
    
    try:
        max_val = sub_run.Runoff.sort_values(ascending=False).iloc[len(list_pre_max)]
    except:
        break
    max_ind = sub_run.loc[sub_run['Runoff'] == max_val, 'ind'].item()
    list_pre_max.append(max_val)
    dropped_indices = sub_run.loc[(abs(sub_run['ind']-max_ind) <= 30) & (sub_run['ind'] != max_ind) & (~sub_run.Runoff.isin(list_pre_max))].index
    
    sub_run.drop(index=dropped_indices, inplace=True)

Output:

>>>sub_run
        Runoff  ind
0   45  3
4   30  53
6   35  93

Dropping value in a dataframe in a loop

Answers (2)

Related Questions