Ryan E Lima
Ryan E Lima

Reputation: 25

Adding Dictionaries to a list within a for-loop, values of last dict.key of each run not being added correctly

I am randomly removing rows of my data array and recalculating a metric (sum), by creating a for-loop. in each iteration of the for-loop I am shuffling the data and removing the first row of values. Then I am summing the values in the remaining rows. For each iteration or run, I want to keep track of the run number, the sum of the remaining points, and the row which was removed. I do this by creating a dictionary of results for each run, then appending those results to a list. However when I print out the list of results dictionaries, the run number and sum are correct, but the value in each results dictionary for the removed row, always seems to be the removed row from the last run, instead of the value from its specific run.

import random
import numpy as np

Points = np.array([[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4],[5,5,5,5],[6,6,6,6]])
index = 1
all_results = []
N = 5
for i in range(N):
    np.random.shuffle(Points)
    removed_row = Points[:index]
    print(f'Removed row from run {i}: {removed_row}')
    remaining_rows = Points[index:]
    sum = np.sum(remaining_rows)
    run_results = {'Run':i,"Sum of remaining points": sum ,"Removed row": removed_row}
    all_results.append(run_results)
print(all_results)

#output
Removed row from run 0: [[2 2 2 2]]
Removed row from run 1: [[2 2 2 2]]
Removed row from run 2: [[4 4 4 4]]
Removed row from run 3: [[5 5 5 5]]
Removed row from run 4: [[4 4 4 4]]
[{'Run': 0, 'Sum of remaining points': 76, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 1, 'Sum of remaining points': 76, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 2, 'Sum of remaining points': 68, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 3, 'Sum of remaining points': 64, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 4, 'Sum of remaining points': 68, 'Removed row': array([[4, 4, 4, 4]])}]

As you can see, it seems to always use the last runs 'removed_row' variable, rather than the run-specific 'removed_row

Upvotes: 2

Views: 36

Answers (3)

Quang Hoang
Quang Hoang

Reputation: 150765

Note that shuffle takes a lot of time. Here's an approach that vectorize a lot:

# choose the index to drop first:
to_drop = np.random.choice(np.arange(len(Points)), N, replace=False)

# remain sum:
remains = Points.sum(0) - Points[to_drop[::-1]].cumsum(0)

out = [{'run':i, 'sum_renmaining': remains[i], 'remove row': Points[to_drop[i]]} for i in range(N)]

Upvotes: 1

Aviv Yaniv
Aviv Yaniv

Reputation: 6298

Well, it is indeed tricky!

The problem is with the assignment:

run_results = {'Run':i,"Sum of remaining points": sum ,"Removed row": removed_row}

Which stores the reference to the removed_row, as in Python variables are just references to objects.

Create instead new array np.array(removed_row):

import random
import numpy as np

points = np.array([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5], [6, 6, 6, 6]])
index = 1
all_results = []
N = 5
for i in range(N):
    np.random.shuffle(points)
    removed_row = points[:index]
    print(f'Removed row from run {i}: {removed_row}')
    remaining_rows = points[index:]
    sum = np.sum(remaining_rows)
    run_results = {'Run':i,"Sum of remaining points": sum ,"Removed row": np.array(removed_row)}
    all_results.append(run_results)
print(all_results)

Output:

Removed row from run 0: [[4 4 4 4]]
Removed row from run 1: [[6 6 6 6]]
Removed row from run 2: [[6 6 6 6]]
Removed row from run 3: [[3 3 3 3]]
Removed row from run 4: [[4 4 4 4]]
[{'Run': 0, 'Sum of remaining points': 68, 'Removed row': array([[4, 4, 4, 4]])}, {'Run': 1, 'Sum of remaining points': 60, 'Removed row': array([[6, 6, 6, 6]])}, {'Run': 2, 'Sum of remaining points': 60, 'Removed row': array([[6, 6, 6, 6]])}, {'Run': 3, 'Sum of remaining points': 72, 'Removed row': array([[3, 3, 3, 3]])}, {'Run': 4, 'Sum of remaining points': 68, 'Removed row': array([[4, 4, 4, 4]])}]

Upvotes: 1

Mike67
Mike67

Reputation: 11342

You need to actually remove the row:

import random
import numpy as np

Points = np.array([[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4],[5,5,5,5],[6,6,6,6]])
index = 1
all_results = []
N = 5
for i in range(N):
    np.random.shuffle(Points)
    removed_row = Points[:index]
    Points = Points[index:]     # <<<<<<< Remove row
    print(f'Removed row from run {i}: {removed_row}')
    remaining_rows = Points[index:]
    sum = np.sum(remaining_rows)
    run_results = {'Run':i,"Sum of remaining points": sum ,"Removed row": removed_row}
    all_results.append(run_results)
print(all_results)

Output

Removed row from run 0: [[3 3 3 3]]
Removed row from run 1: [[6 6 6 6]]
Removed row from run 2: [[1 1 1 1]]
Removed row from run 3: [[2 2 2 2]]
Removed row from run 4: [[4 4 4 4]]
[{'Run': 0, 'Sum of remaining points': 56, 'Removed row': array([[3, 3, 3, 3]])}, 
 {'Run': 1, 'Sum of remaining points': 40, 'Removed row': array([[6, 6, 6, 6]])}, 
 {'Run': 2, 'Sum of remaining points': 24, 'Removed row': array([[1, 1, 1, 1]])}, 
 {'Run': 3, 'Sum of remaining points': 16, 'Removed row': array([[2, 2, 2, 2]])}, 
 {'Run': 4, 'Sum of remaining points':  0, 'Removed row': array([[4, 4, 4, 4]])}]

Upvotes: 1

Related Questions