Reputation: 363
So I'm struggling to make this code more useful for larger data-sets. Here is the code, I will explain it thoroughly afterwards:
import numpy as np
np.set_printoptions(threshold='nan')
tri_nums = [3, 2, 1]
paths = [1, 3, 4, 5]
vol_list = [10, 10, 10, 15, 15, 25]
n = 0
array_list = []
while n <= len(tri_nums):
for num in tri_nums:
print "assigning volume", vol_list[sum(tri_nums)-len(vol_list)+1]
volume_array = np.zeros(shape = (6, 6))
volume_array[paths[num-1],paths[num]] = vol_list[sum(tri_nums)-len(vol_list)+1]
array_list.append(volume_array)
print paths[num-1], paths[num]
tri_nums.pop(0)
paths.pop(0)
n+=1
print paths
print tri_nums
final_array = sum(array_list)
print array_list
print final_array
Starting with tri_nums
:
The values of tri_nums
will always be a list of the triangular numbers of the length of paths
. So a paths
list of say, [1, 3, 4, 5, 6, 8]
, will give a tri_nums
of [5, 4, 3, 2, 1]
.
tri_nums
are also correlated to the number of values in vol_list
. As you can see, there are three 10
's in vol_list
. The number of 10
's is equal to the first value of tri_nums
. There are also two 15
's and a 2
for the second value of tri_nums
. This pattern will never change! Another example of this is:
paths = [1, 3, 4, 5, 6, 8]
tri_nums = [5, 4, 3, 2, 1]
vol_list = [15, 15, 15, 15, 15, 250, 250, 250, 250, 25, 25, 25, 10, 10, 15]
The list paths
(in the original case) is made up of four 'nodes', nodes 1,3,4 and 5. Between each adjacent node there is a path, i.e. path 1-3, 3-4, 4-5.
As one can see, volume_array
is a 6x6 array and is made up of zeros. The row values in volume_array
that are to be changed correspond to the first value of each path i.e. 1
,3
, 4
. The column values correspond to the second number of each path i.e. 3
, 4
,5
.
Here comes the tricky bit!
The values in vol_list
are allocated to the aforementioned array items as follows:
tri_nums
a value in vol_list
is added to volume_array
. The row value within this array is defined by the first value of a path i.e. [4]
and the column value is defined by the second value of a path (for the value [4]
this will mean [5]
). tri_nums[0]
, the value 10
is added three times, once to volume_array[4][5]
, once to volume_array[3][4]
and once to volume_array[1][3]
. tri_nums[1]
the value 15
is added twice, once to volume_array[4][5]
and once to volume_array[3][4]
.tri_nums[2]
the value 25
is added once to volume_array[4][5]
. final_array
. Another thing worth mentioning is that the sum of tri_nums
is equal to len(vol_list)
. Furthermore tri_nums[n]
is always > tri_nums[n+1]
.
Ideally I would like to implement this code for path
's, tri_num
's and vol_list
's with hundreds of items in them. The method I am using now would mean I need to make hundreds of while loops by hand. How can I make the while
loops work simultaneously so I can avoid the "hundreds of while loops" scenario?
Everything is working just fine, but the final output is:
[[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 10. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 25. 0.]
[ 0. 0. 0. 0. 0. 25.]
[ 0. 0. 0. 0. 0. 0.]]
Meaning that the final value of vol_list
which is (25
) has not been assigned to array_list[4][5]
and thus was not in final_array
. It just needs to do one more loop and it'll work, I'm not sure how to get it to do the last loop though.
Please let me know if anything is unclear!
Thanks
Upvotes: 1
Views: 5935
Reputation: 131550
The reason you miss the last element of your array is that you're incrementing n
at the same time you're popping elements off of tri_nums
. Look at the values of n
and tri_nums
at the beginning of each iteration of your while
loop:
iteration n tri_nums n <= len(tri_nums)
0 0 [3, 2, 1] True
1 1 [2, 1] True
2 2 [1] False
You should either keep n
at 0, and make your condition while tri_nums
(which is equivalent to while len(tri_nums) > 0
), or probably better, you should avoid modifying tri_nums
and just use a for
loop. You would then need to modify the inner loop to only iterate over part of tri_nums
each time, like so:
for n in xrange(len(tri_nums)):
for num in tri_nums[n:]:
That being said, the whole approach of iterating over lists, creating multiple arrays, and adding them all up is quite inefficient. Because this isn't Code Review, I won't get into all the inefficiencies, but there are a few key ones I want to mention:
With all that in mind, I'd recommend changing your code so that vol_list
only contains each number once.
vol_list = [10, 15, 25]
You can then construct the array you need by adding your numbers up first, and then sticking the resulting sums in the array. Numpy conveniently includes the cumsum
function to compute partial sums of an array:
>>> np.cumsum([10, 15, 25])
array([10, 25, 50])
and it allows you to specify many values at once in its indexing operations. So your entire algorithm can be reduced to this:
final_array = np.zeros((6, 6))
final_array[paths[:-1], paths[1:]] = np.cumsum(vol_list)
If your memory requirements become problematic for long lists, you might want to use Scipy's sparse matrices for storage, instead of ordinary Numpy arrays.
Upvotes: 1