Cumulative sum of triangular matrix numpy

Question

Say I have the following numpy array:

a = np.array([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]])

And I want the cumulative sum array-wise like this:

a.cumsum(axis=1)
array([[ 1.,  3.,  6.],
       [ 1.,  3.,  6.],
       [ 1.,  3.,  6.]])

Is there any way to do the same with a triangular array/matrix?

b = np.array([[1.0, 2.0, 3.0], [2.0, 3.0], [3.0]])

Basically the following result:

array([[1.0, 2.0, 3.0], [2.0, 5.0], [3.0]], dtype=object)

I tried the following but I get the following error message:

b.cumsum(axis=1)

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 1, in 
    b.cumsum(axis=1)
ValueError: axis(=1) out of bounds

Should I try to redefine b in order to have nan values? I'd like to avoid adding zeroes as the end of my smaller arrays (my real arrays could contains zeros, which for me is different than having no values at all).

user6655984 · Accepted Answer

The question is based on false premise. There is no such thing as a triangular array in NumPy, so it does not make sense to ask how to find a cumulative some of one. If you write

b = np.array([[1.0, 2.0, 3.0], [2.0, 3.0], [3.0]])

you get a one-dimensional array of the kind array([object, object, object]). There is no matrix structure here, no axes to swap, no ufuncs to apply, really nothing NumPy-related. Just a bunch of Python objects which happen to be Python lists (they are not NumPy arrays).

Representing missing values by NaN is a common thing to do.

row_list = [[1.0, 2.0, 3.0], [2.0, 3.0], [3.0]]
max_length = max([len(row) for row in row_list])
b = np.array([row + [np.nan]*(max_length - len(row)) for row in row_list])

Now b is an honest float-datatype NumPy array, to which you can apply cumsum or whatever.

b.cumsum(axis=1)

returns

array([[  1.,   3.,   6.],
       [  2.,   5.,  nan],
       [  3.,  nan,  nan]])

There is masked array module for more complex things of this kind, but NaN-padding works fine for basic operations on a ragged matrix. Some other things one can do:

np.nansum(b, axis=1)    # sum, ignoring NaN 
np.nanmean(b, axis=1)   # mean, ignoring NaN
np.isnan(b)             # find where NaN are in the array

Cumulative sum of triangular matrix numpy

Answers (1)

Related Questions