Appending/concatenating arrays inside 2D array

Question

I'm trying to generate a 3x4 array, for which each element will become an array as well of unknown size. During the process I'm appending new numbers one by one to certain cells of the 3x4 matrix. I eventually want to end up with an array that looks like this:

[[[1,8,9],[1,2],[],[]],
[[8],[],[4,5],[9,1]],
[[],[7,1,4],[],[2,1,3]]]

Right now I've been trying to use append and concatenate, but I can't seem to find a good way to do this since the inside arrays are of changing size. Also I don't know what the best way is to initialize my matrix. Simplified, my code looks like this:

mat = np.empty((3,4,1))
for x in range(1000):
    i, j, value = somefunction()
    mat[i,j,:] = np.append(mat[i,j,:], value)

Does anybody know the best way to append (or concatenate or...) these values to my matrix? I have been looking up similar questions concerning appending and concatenation and tried a lot of different things, but I wasn't able to figure it out. I found it quite hard to explain my question, so I hope my description is clear.

MB-F · Accepted Answer

You can use so called object arrays to get this job done. Normally, numpy arrays consist of primitive type but it is possible to create arrays where each element is an arbitrary Python object. This way you can make an array that contains arrays.

mat = np.empty((3, 4), dtype=object)

Note that each element in mat is now None. Let's fill the matrix:

for x in range(1000):
    i, j, value = somefunction()
    if mat[i, j] is None:
        mat[i, j] = np.array(value)
    else:
        mat[i, j] = np.append(mat[i, j], value)

This should get the job done, but it's most horribly inefficient for two reasons:

dtype=object loose almost all properties that make numpy arrays fast. Every operation on an element must involve the Python interpreter, which normally would not happen.
numpy arrays are designed to be static; they are not designed to grow. So what np.append really does is copying the old array into a new bigger array. This gets slower over time the more the array grows.

Considering that you want to reduce the whole thing into a 3x4 array in the end, it's probably better to work with regular Python lists:

# initialize a 3x4x0 hierarchy of nested lists
mat = [[[] for _ in range(4)] for _ in range(3)]

for x in range(1000):
    i, j, value = somefunction()
    mat[i][j].append(value)

# reduce each sub-list to its mean (empty list -> nan)
for i in range(3):
    for j in range(4):
        mat[i][j] = np.mean(mat[i][j])

# FINALLY convert to array
mat = np.array(mat)

Appending/concatenating arrays inside 2D array

Answers (2)

Related Questions