Andrew Earl
Andrew Earl

Reputation: 363

Storing the values of an iteration to a numpy array - Python, Numpy

So I'm struggling to make this code more useful for larger data-sets. Here is the code, I will explain it thoroughly afterwards:

import numpy as np
np.set_printoptions(threshold='nan')

tri_nums = [3, 2, 1]


paths = [1, 3, 4, 5]

vol_list = [10, 10, 10, 15, 15, 25]

n = 0

array_list = []

while n <= len(tri_nums):
    for num in tri_nums:
        print "assigning volume", vol_list[sum(tri_nums)-len(vol_list)+1]
        volume_array = np.zeros(shape = (6, 6))
        volume_array[paths[num-1],paths[num]] = vol_list[sum(tri_nums)-len(vol_list)+1]
        array_list.append(volume_array)
        print paths[num-1], paths[num]


    tri_nums.pop(0)
    paths.pop(0)
    n+=1
    print paths
    print tri_nums


final_array = sum(array_list)
print array_list
print final_array

Starting with tri_nums: The values of tri_nums will always be a list of the triangular numbers of the length of paths. So a paths list of say, [1, 3, 4, 5, 6, 8], will give a tri_nums of [5, 4, 3, 2, 1].

tri_nums are also correlated to the number of values in vol_list. As you can see, there are three 10's in vol_list. The number of 10's is equal to the first value of tri_nums. There are also two 15's and a 2 for the second value of tri_nums. This pattern will never change! Another example of this is:

paths = [1, 3, 4, 5, 6, 8]

tri_nums = [5, 4, 3, 2, 1]

vol_list = [15, 15, 15, 15, 15, 250, 250, 250, 250, 25, 25, 25, 10, 10, 15]

The list paths (in the original case) is made up of four 'nodes', nodes 1,3,4 and 5. Between each adjacent node there is a path, i.e. path 1-3, 3-4, 4-5.

As one can see, volume_array is a 6x6 array and is made up of zeros. The row values in volume_array that are to be changed correspond to the first value of each path i.e. 1,3, 4. The column values correspond to the second number of each path i.e. 3, 4,5.

Here comes the tricky bit!

The values in vol_list are allocated to the aforementioned array items as follows:

  1. For each value of tri_nums a value in vol_list is added to volume_array. The row value within this array is defined by the first value of a path i.e. [4]and the column value is defined by the second value of a path (for the value [4] this will mean [5]).
  2. For tri_nums[0], the value 10 is added three times, once to volume_array[4][5], once to volume_array[3][4] and once to volume_array[1][3].
  3. For tri_nums[1] the value 15 is added twice, once to volume_array[4][5] and once to volume_array[3][4].
  4. For tri_nums[2] the value 25 is added once to volume_array[4][5].
  5. Finally, all of the values in of the arrays generated in the previous three steps are added together to get final_array.

Another thing worth mentioning is that the sum of tri_nums is equal to len(vol_list). Furthermore tri_nums[n] is always > tri_nums[n+1].

Ideally I would like to implement this code for path's, tri_num's and vol_list's with hundreds of items in them. The method I am using now would mean I need to make hundreds of while loops by hand. How can I make the while loops work simultaneously so I can avoid the "hundreds of while loops" scenario?

Everything is working just fine, but the final output is:

[[  0.   0.   0.   0.   0.   0.]
 [  0.   0.   0.  10.   0.   0.]
 [  0.   0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.  25.   0.]
 [  0.   0.   0.   0.   0.  25.]
 [  0.   0.   0.   0.   0.   0.]]

Meaning that the final value of vol_list which is (25) has not been assigned to array_list[4][5] and thus was not in final_array. It just needs to do one more loop and it'll work, I'm not sure how to get it to do the last loop though.

Please let me know if anything is unclear!

Thanks

Upvotes: 1

Views: 5935

Answers (1)

David Z
David Z

Reputation: 131550

The reason you miss the last element of your array is that you're incrementing n at the same time you're popping elements off of tri_nums. Look at the values of n and tri_nums at the beginning of each iteration of your while loop:

iteration    n    tri_nums    n <= len(tri_nums)
0            0    [3, 2, 1]   True
1            1    [2, 1]      True
2            2    [1]         False

You should either keep n at 0, and make your condition while tri_nums (which is equivalent to while len(tri_nums) > 0), or probably better, you should avoid modifying tri_nums and just use a for loop. You would then need to modify the inner loop to only iterate over part of tri_nums each time, like so:

for n in xrange(len(tri_nums)):
    for num in tri_nums[n:]:

That being said, the whole approach of iterating over lists, creating multiple arrays, and adding them all up is quite inefficient. Because this isn't Code Review, I won't get into all the inefficiencies, but there are a few key ones I want to mention:

  • You have a lot of structure in your input data that you could take advantage of
  • You should try to use Numpy vectorized operations instead of native Python operations as much as possible
  • You keep putting numbers at the same indices, so you can add up the numbers first and only create the array at the end

With all that in mind, I'd recommend changing your code so that vol_list only contains each number once.

vol_list = [10, 15, 25]

You can then construct the array you need by adding your numbers up first, and then sticking the resulting sums in the array. Numpy conveniently includes the cumsum function to compute partial sums of an array:

>>> np.cumsum([10, 15, 25])
array([10, 25, 50])

and it allows you to specify many values at once in its indexing operations. So your entire algorithm can be reduced to this:

final_array = np.zeros((6, 6))
final_array[paths[:-1], paths[1:]] = np.cumsum(vol_list)

If your memory requirements become problematic for long lists, you might want to use Scipy's sparse matrices for storage, instead of ordinary Numpy arrays.

Upvotes: 1

Related Questions