Reputation: 139
I'm trying to generate a 3x4 array, for which each element will become an array as well of unknown size. During the process I'm appending new numbers one by one to certain cells of the 3x4 matrix. I eventually want to end up with an array that looks like this:
[[[1,8,9],[1,2],[],[]],
[[8],[],[4,5],[9,1]],
[[],[7,1,4],[],[2,1,3]]]
Right now I've been trying to use append and concatenate, but I can't seem to find a good way to do this since the inside arrays are of changing size. Also I don't know what the best way is to initialize my matrix. Simplified, my code looks like this:
mat = np.empty((3,4,1))
for x in range(1000):
i, j, value = somefunction()
mat[i,j,:] = np.append(mat[i,j,:], value)
Does anybody know the best way to append (or concatenate or...) these values to my matrix? I have been looking up similar questions concerning appending and concatenation and tried a lot of different things, but I wasn't able to figure it out. I found it quite hard to explain my question, so I hope my description is clear.
Upvotes: 1
Views: 621
Reputation: 231335
An easy way to test whether such an array will be useful is to wrap your list of lists in np.array
:
In [767]: mat = np.array([[[1,8,9],[1,2],[],[]],
...: [[8],[],[4,5],[9,1]],
...: [[],[7,1,4],[],[2,1,3]]])
In [768]: mat
Out[768]:
array([[list([1, 8, 9]), list([1, 2]), list([]), list([])],
[list([8]), list([]), list([4, 5]), list([9, 1])],
[list([]), list([7, 1, 4]), list([]), list([2, 1, 3])]], dtype=object)
In [769]: mat.shape
Out[769]: (3, 4)
The result is (3,4) object dtype array. This isn't the most reliable way of making an object dtype array (starting with the np.empty((3,4),object)
is more general), but in this case it works fine.
But such an array doesn't have many advantages compared to the original list of lists. Most of the faster array operations don't work. Most tasks will require Python level iteration over the list elements.
I could use np.vectorize
to iterate, for example to take means:
In [775]: np.vectorize(np.mean)(mat)
/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/usr/local/lib/python3.5/dist-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Out[775]:
array([[ 6. , 1.5, nan, nan],
[ 8. , nan, 4.5, 5. ],
[ nan, 4. , nan, 2. ]])
It doesn't like taking the mean of an empty list. We could write a simple function that handles []
more gracefully.
I could turn the lists into arrays (note the use of otypes
):
In [777]: arr = np.vectorize(np.array,otypes=[object])(mat)
In [778]: arr
Out[778]:
array([[array([1, 8, 9]), array([1, 2]), array([], dtype=float64),
array([], dtype=float64)],
[array([8]), array([], dtype=float64), array([4, 5]), array([9, 1])],
[array([], dtype=float64), array([7, 1, 4]),
array([], dtype=float64), array([2, 1, 3])]], dtype=object)
though I'm not sure this buys us much.
Upvotes: 1
Reputation: 23637
You can use so called object arrays to get this job done. Normally, numpy arrays consist of primitive type but it is possible to create arrays where each element is an arbitrary Python object. This way you can make an array that contains arrays.
mat = np.empty((3, 4), dtype=object)
Note that each element in mat
is now None
. Let's fill the matrix:
for x in range(1000):
i, j, value = somefunction()
if mat[i, j] is None:
mat[i, j] = np.array(value)
else:
mat[i, j] = np.append(mat[i, j], value)
This should get the job done, but it's most horribly inefficient for two reasons:
dtype=object
loose almost all properties that make numpy arrays fast. Every operation on an element must involve the Python interpreter, which normally would not happen.np.append
really does is copying the old array into a new bigger array. This gets slower over time the more the array grows.Considering that you want to reduce the whole thing into a 3x4 array in the end, it's probably better to work with regular Python lists:
# initialize a 3x4x0 hierarchy of nested lists
mat = [[[] for _ in range(4)] for _ in range(3)]
for x in range(1000):
i, j, value = somefunction()
mat[i][j].append(value)
# reduce each sub-list to its mean (empty list -> nan)
for i in range(3):
for j in range(4):
mat[i][j] = np.mean(mat[i][j])
# FINALLY convert to array
mat = np.array(mat)
Upvotes: 2