Reputation: 1675
Say I have the following numpy array
:
a = np.array([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]])
And I want the cumulative sum array-wise like this:
a.cumsum(axis=1)
array([[ 1., 3., 6.],
[ 1., 3., 6.],
[ 1., 3., 6.]])
Is there any way to do the same with a triangular array/matrix?
b = np.array([[1.0, 2.0, 3.0], [2.0, 3.0], [3.0]])
Basically the following result:
array([[1.0, 2.0, 3.0], [2.0, 5.0], [3.0]], dtype=object)
I tried the following but I get the following error message:
b.cumsum(axis=1)
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-76-831556b68f3f>", line 1, in <module>
b.cumsum(axis=1)
ValueError: axis(=1) out of bounds
Should I try to redefine b
in order to have nan
values? I'd like to avoid adding zeroes as the end of my smaller arrays (my real arrays could contains zeros, which for me is different than having no values at all).
Upvotes: 1
Views: 386
Reputation:
The question is based on false premise. There is no such thing as a triangular array in NumPy, so it does not make sense to ask how to find a cumulative some of one. If you write
b = np.array([[1.0, 2.0, 3.0], [2.0, 3.0], [3.0]])
you get a one-dimensional array of the kind array([object, object, object])
. There is no matrix structure here, no axes to swap, no ufuncs to apply, really nothing NumPy-related. Just a bunch of Python objects which happen to be Python lists (they are not NumPy arrays).
Representing missing values by NaN is a common thing to do.
row_list = [[1.0, 2.0, 3.0], [2.0, 3.0], [3.0]]
max_length = max([len(row) for row in row_list])
b = np.array([row + [np.nan]*(max_length - len(row)) for row in row_list])
Now b
is an honest float-datatype NumPy array, to which you can apply cumsum
or whatever.
b.cumsum(axis=1)
returns
array([[ 1., 3., 6.],
[ 2., 5., nan],
[ 3., nan, nan]])
There is masked array
module for more complex things of this kind, but NaN-padding works fine for basic operations on a ragged matrix. Some other things one can do:
np.nansum(b, axis=1) # sum, ignoring NaN
np.nanmean(b, axis=1) # mean, ignoring NaN
np.isnan(b) # find where NaN are in the array
Upvotes: 3