MadisonCooper
MadisonCooper

Reputation: 236

Python List of Lists vs numpy

So, I have a snippet of a script:

lol = []
latv1 = 0
latv2 = 0
latv3 = 0

#Loop a
for a in range(100):

    #Refresh latv2 after each iteration of loop a
    latv2 = 0

    #Loop b
    for b in range(100):

        #Refresh latv3 after each iteration of loop b
        latv3 = 0

        #Loop c        
        for c in range(100):

            #Make 4 value list according to iteration and append to lol
            midl2 = [latv1,latv2,latv3,0]
            lol.append(midl2)

            #Iterate after loop
            latv3 = latv3 + 1
        latv2 = latv2 + 1
    latv1 = latv1 + 1

Which will do what I want it to do.... but very slowly. It gives:

[[0,0,0,0]
 [0,0,1,0]
 ...
 [0,1,0,0]
 [0,1,1,0]
 ...
 [9,9,8,0]
 [9,9,9,0]]

I've read about numpy and its speed and optimization. I cannot figure out how to implement with numpy what I have above. I've learned how to make an array of zeroes with numpy via the manuals:

numpy_array = np.zeroes((100,4))

To give:

[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 ..., 
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]

and can change the values of each column with:

numpA  = np.arange(0,100,1)
numpB  = np.arange(0,100,1
numpC  = np.arange(0,100,1)
numArr[:,0] = numpA
numArr[:,1] = numpB
numArr[:,2] = numpC

giving:

[[   0.    0.    0.    0.]
 [   1.    1.    1.    0.]
 [   2.    2.    2.    0.]
 ..., 
 [ 997.  997.  997.    0.]
 [ 998.  998.  998.    0.]
 [ 999.  999.  999.    0.]]

but I cannot create a numpy array 1000000 lines long and have the columns increment like the original example did. If I call the zero array creation with 1000000 instead of 100 the column substitution does not work, which makes sense as the length of the array and the substitution are unequal - but I am not sure how to correctly iterate the substitution arrays to work.

How can I replicate the original scripts output via numpy arrays?

Note: This is a python 2.7 machine, but it's 64 bit at least. I know RAM use is an issue, but I should be able to change the dtype of the array to fit my needs.

Upvotes: 1

Views: 530

Answers (1)

Divakar
Divakar

Reputation: 221504

Approach #1

To create the NumPy equivalent of the posted code and have NumPy array as output, you could additionally make use of itertools, like so -

from itertools import product

out = np.zeros((N**3,4),dtype=int)
out[:,:3] = list(product(np.arange(N), repeat=3))

Please note that it would be N = 100 to make it equivalent to the posted code.

Approach #2

Another potentially faster approach based on purely NumPy and using it's vectorized broadcasting capabilities could be suggested like so -

out = np.zeros((N**3,4),dtype=int)
out[:,:3] = (np.arange(N**3)[:,None]/[N**2,N,1])%N

I would think this to be faster than the previous itertools based one, because that created a list of tuples that are to be set into a NumPy array. We will test this theory out in the next section.


Runtime test

In [111]: def itertools_based(N):
     ...:     out = np.zeros((N**3,4),dtype=int)
     ...:     out[:,:3] = list(product(np.arange(N), repeat=3))
     ...:     return out
     ...: 
     ...: def broadcasting_based(N):
     ...:     out = np.zeros((N**3,4),dtype=int)
     ...:     out[:,:3] = (np.arange(N**3)[:,None]/[N**2,N,1])%N
     ...:     return out


In [112]: N = 20

In [113]: np.allclose(itertools_based(N),broadcasting_based(N)) # Verify results
Out[113]: True

In [114]: %timeit itertools_based(N)
100 loops, best of 3: 7.42 ms per loop

In [115]: %timeit broadcasting_based(N)
1000 loops, best of 3: 1.23 ms per loop

Now, let's time just the creation of list of tuples of those iterated elements and put it against the NumPy based one -

In [116]: %timeit list(product(np.arange(N), repeat=3))
1000 loops, best of 3: 746 µs per loop

In [117]: %timeit (np.arange(N**3)[:,None]/[N**2,N,1])%N
1000 loops, best of 3: 1.09 ms per loop

Well, so the creation part for the itertools-based one is faster now, as predicted/thought out earlier! So, if you are happy with the first three columns as output and them being list of tuples, then go with itertools.

Upvotes: 5

Related Questions