Reputation: 309
I have a dataframe with sorted values:
import numpy as np
import pandas as pd
sub_run = pd.DataFrame({'Runoff':[45,10,5,26,30,23,35], 'ind':[3, 10, 25,43,53,60,93]})
I would like to start from the highest value in Runoff (45), drop all values with which the difference in "ind" is less than 30 (10, 5), reupdate the DataFrame , then go to the second highest value (35): drop the indices with which the difference in "ind" is < 30 , then the the third highest value (30) and drop 26 and 23... I wrote the following code :
pre_ind = []
for (idx1, row1) in sub_run.iterrows():
var = row1.ind
pre_ind.append(np.array(var))
for (idx2,row2) in sub_run.iterrows():
if (row2.ind != var) and (row2.ind not in pre_ind):
test = abs(row2.ind - var)
print("test" , test)
if test <= 30:
sub_run = sub_run.drop(sub_run[sub_run.ind == row2.ind].index)
I expect to find as an output the values [45,35,30]. However I only find the first one.
Many thanks
Upvotes: 3
Views: 89
Reputation: 141
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html
In your case, the modification of sub_run
has no effect immediately on the iteration.
Therefore, in the outer loop, after iteration on 45, 3
,
the next row iterated is 35, 93
, followed by 30, 53
, 26, 43
, 23, 60
, 10, 10
, 5, 25
. For the inner loop, your modification works since you re-enter a new loop through iteration on the outer loop.
Here is my advice code, inspired by bubble sort.
import pandas as pd
sub_run = pd.DataFrame({'Runoff': [45,10,5,26,30,23,35],
'ind': [3,10,25,43,53,60,93]})
sub_run = sub_run.sort_values(by=['Runoff'], ascending=False)
highestRow = 0
while highestRow < len(sub_run) - 1:
cur_run = sub_run
highestRunoffInd = cur_run.iloc[highestRow].ind
for i in range(highestRow + 1, len(cur_run)):
ind = cur_run.iloc[i].ind
if abs(ind - highestRunoffInd) <= 30:
sub_run = sub_run.drop(sub_run[sub_run.ind == ind].index)
highestRow += 1
print(sub_run)
Output:
Runoff ind
0 45 3
6 35 93
4 30 53
Upvotes: 1
Reputation: 1025
Try this:
list_pre_max = []
while True:
try:
max_val = sub_run.Runoff.sort_values(ascending=False).iloc[len(list_pre_max)]
except:
break
max_ind = sub_run.loc[sub_run['Runoff'] == max_val, 'ind'].item()
list_pre_max.append(max_val)
dropped_indices = sub_run.loc[(abs(sub_run['ind']-max_ind) <= 30) & (sub_run['ind'] != max_ind) & (~sub_run.Runoff.isin(list_pre_max))].index
sub_run.drop(index=dropped_indices, inplace=True)
Output:
>>>sub_run
Runoff ind
0 45 3
4 30 53
6 35 93
Upvotes: 1