user10288040
user10288040

Reputation: 21

Python Numpy appending multiple lists from objects

I am calling an object several times that is returning a numpy list:

for x in range(0,100):
        d = simulation3()

d = [0, 1, 2, 3]
d = [4, 5, 6, 7]

..and many more

I want to take each list and append it to a 2D array.

final_array = [[0, 1, 2, 3],[4, 5, 6, 7]...and so forth]

I tried creating an empty array (final_array = np.zeros(4,4)) and appending it but the values are appending after the 4X4 matrix is created.

Can anyone help me with this? thank you!

Upvotes: 2

Views: 809

Answers (3)

jpp
jpp

Reputation: 164623

You can use np.fromiter to create an array from an iterable. Since, by default, this function only works with scalars, you can use itertools.chain to help:

np.random.seed(0)

from itertools import chain

def simulation3():
    return np.random.randint(0, 10, 4)

n = 5
d = np.fromiter(chain.from_iterable(simulation3() for _ in range(5)), dtype='i')
d.shape = 5, 4

print(d)

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6],
       [8, 8, 1, 6],
       [7, 7, 8, 1]], dtype=int32)

But this is relatively inefficient. NumPy performs best with fixed size arrays. If you know the size of your array in advance, you can define an empty array and update rows sequentially. See the alternatives described by @norok2.

Upvotes: 1

norok2
norok2

Reputation: 26886

The optimal solution depends on the numbers / sizes you are dealing with. My favorite solution (which only works if you already know the size of the final result) is to initialize the array which will contain your results and then fill each you could initialize your result and then fill it using views. This the most memory efficient solution.

If you do not know the size of the final result, then you are better off by generating a list of lists, which can be converted (or stacked) as a NumPy array at the end of the process.

Here are some examples, where gen_1d_list() is used to generate some random numbers to mimic the result of simulate3() (meaning that in the following code, you should replace gen_1d_list(n, dtype) with simulate3()):

  • stacking1() implements the filling using views
  • stacking2() implements the list generation and converting to NumPy array
  • stacking3() implements the list generation and stacking to NumPy array
  • stacking4() implements the dynamic modification of a NumPy array using vstack() as proposed earlier.
import numpy as np

def gen_1d_list(n, dtype=int):
    return list(np.random.randint(1, 100, n, dtype))

def stacking1(n, m, dtype=int):
    arr = np.empty((n, m), dtype=dtype)
    for i in range(n):
        arr[i] = gen_1d_list(m, dtype)
    return arr

def stacking2(n, m, dtype=int):
    items = [gen_1d_list(m, dtype) for i in range(n)]
    arr = np.array(items)
    return arr

def stacking3(n, m, dtype=int):
    items = [gen_1d_list(m, dtype) for i in range(n)]
    arr = np.stack(items, dtype)
    return arr

def stacking4(n, m, dtype=int):
    arr = np.zeros((0, m), dtype=dtype)
    for i in range(n):
        arr = np.vstack((gen_1d_list(m, dtype), arr))
    return arr

Time-wise, stacking1() and stacking2() are more or less equally fast, while stacking3() and stacking4() are slower (and, in proportion, much slower for small size inputs).

Some numbers, for small size inputs:

n, m = 4, 10
%timeit stacking1(n, m)
# 15.7 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit stacking2(n, m)
# 14.2 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit stacking3(n, m)
# 22.7 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit stacking4(n, m)
# 31.8 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

and for larger size inputs:

n, m = 4, 1000000
%timeit stacking1(n, m)
# 344 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stacking2(n, m)
# 350 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stacking3(n, m)
# 370 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stacking4(n, m)
# 369 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 0

Javad Sameri
Javad Sameri

Reputation: 1319

there are multiple way to do it in numpy , the easiest way is to use vstack like this :

for Ex :

#you have these array you want to concat

d1 = [0, 1, 2, 3]
d2 = [4, 5, 6, 7]
d3 = [4, 5, 6, 7]

#initialize your variable with zero raw 
X = np.zeros((0,4))

#then each time you call your function use np.vstack like this :
X = np.vstack((np.array(d1),X))
X = np.vstack((np.array(d2),X))
X = np.vstack((np.array(d2),X))

# and finally you have your array like below
#array([[4., 5., 6., 7.],
#       [4., 5., 6., 7.],
#       [0., 1., 2., 3.]])

Upvotes: 0

Related Questions