Reputation: 25
I am randomly removing rows of my data array and recalculating a metric (sum), by creating a for-loop. in each iteration of the for-loop I am shuffling the data and removing the first row of values. Then I am summing the values in the remaining rows. For each iteration or run, I want to keep track of the run number, the sum of the remaining points, and the row which was removed. I do this by creating a dictionary of results for each run, then appending those results to a list. However when I print out the list of results dictionaries, the run number and sum are correct, but the value in each results dictionary for the removed row, always seems to be the removed row from the last run, instead of the value from its specific run.
import random
import numpy as np
Points = np.array([[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4],[5,5,5,5],[6,6,6,6]])
index = 1
all_results = []
N = 5
for i in range(N):
np.random.shuffle(Points)
removed_row = Points[:index]
print(f'Removed row from run {i}: {removed_row}')
remaining_rows = Points[index:]
sum = np.sum(remaining_rows)
run_results = {'Run':i,"Sum of remaining points": sum ,"Removed row": removed_row}
all_results.append(run_results)
print(all_results)
#output
Removed row from run 0: [[2 2 2 2]]
Removed row from run 1: [[2 2 2 2]]
Removed row from run 2: [[4 4 4 4]]
Removed row from run 3: [[5 5 5 5]]
Removed row from run 4: [[4 4 4 4]]
[{'Run': 0, 'Sum of remaining points': 76, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 1, 'Sum of remaining points': 76, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 2, 'Sum of remaining points': 68, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 3, 'Sum of remaining points': 64, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 4, 'Sum of remaining points': 68, 'Removed row': array([[4, 4, 4, 4]])}]
As you can see, it seems to always use the last runs 'removed_row' variable, rather than the run-specific 'removed_row
Upvotes: 2
Views: 36
Reputation: 150765
Note that shuffle takes a lot of time. Here's an approach that vectorize a lot:
# choose the index to drop first:
to_drop = np.random.choice(np.arange(len(Points)), N, replace=False)
# remain sum:
remains = Points.sum(0) - Points[to_drop[::-1]].cumsum(0)
out = [{'run':i, 'sum_renmaining': remains[i], 'remove row': Points[to_drop[i]]} for i in range(N)]
Upvotes: 1
Reputation: 6298
Well, it is indeed tricky!
The problem is with the assignment:
run_results = {'Run':i,"Sum of remaining points": sum ,"Removed row": removed_row}
Which stores the reference to the removed_row
, as in Python variables are just references to objects.
Create instead new array np.array(removed_row)
:
import random
import numpy as np
points = np.array([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5], [6, 6, 6, 6]])
index = 1
all_results = []
N = 5
for i in range(N):
np.random.shuffle(points)
removed_row = points[:index]
print(f'Removed row from run {i}: {removed_row}')
remaining_rows = points[index:]
sum = np.sum(remaining_rows)
run_results = {'Run':i,"Sum of remaining points": sum ,"Removed row": np.array(removed_row)}
all_results.append(run_results)
print(all_results)
Output:
Removed row from run 0: [[4 4 4 4]]
Removed row from run 1: [[6 6 6 6]]
Removed row from run 2: [[6 6 6 6]]
Removed row from run 3: [[3 3 3 3]]
Removed row from run 4: [[4 4 4 4]]
[{'Run': 0, 'Sum of remaining points': 68, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 1, 'Sum of remaining points': 60, 'Removed row': array([[6, 6, 6, 6]])}, {'Run': 2, 'Sum of remaining points': 60, 'Removed row': array([[6, 6, 6, 6]])}, {'Run': 3, 'Sum of remaining points': 72, 'Removed row': array([[3, 3, 3, 3]])}, {'Run': 4, 'Sum of remaining points': 68, 'Removed row': array([[4, 4, 4, 4]])}]
Upvotes: 1
Reputation: 11342
You need to actually remove the row:
import random
import numpy as np
Points = np.array([[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4],[5,5,5,5],[6,6,6,6]])
index = 1
all_results = []
N = 5
for i in range(N):
np.random.shuffle(Points)
removed_row = Points[:index]
Points = Points[index:] # <<<<<<< Remove row
print(f'Removed row from run {i}: {removed_row}')
remaining_rows = Points[index:]
sum = np.sum(remaining_rows)
run_results = {'Run':i,"Sum of remaining points": sum ,"Removed row": removed_row}
all_results.append(run_results)
print(all_results)
Output
Removed row from run 0: [[3 3 3 3]]
Removed row from run 1: [[6 6 6 6]]
Removed row from run 2: [[1 1 1 1]]
Removed row from run 3: [[2 2 2 2]]
Removed row from run 4: [[4 4 4 4]]
[{'Run': 0, 'Sum of remaining points': 56, 'Removed row': array([[3, 3, 3, 3]])},
{'Run': 1, 'Sum of remaining points': 40, 'Removed row': array([[6, 6, 6, 6]])},
{'Run': 2, 'Sum of remaining points': 24, 'Removed row': array([[1, 1, 1, 1]])},
{'Run': 3, 'Sum of remaining points': 16, 'Removed row': array([[2, 2, 2, 2]])},
{'Run': 4, 'Sum of remaining points': 0, 'Removed row': array([[4, 4, 4, 4]])}]
Upvotes: 1