Sakurai.JJ
Sakurai.JJ

Reputation: 684

Inverse of cumsum for multi-dimensional array in python

I would like to know how to perform the inverse operation of cumulative sum on a multi-dimensional array in Python.

For example, we can get cumulative array P of a given 2D array T by

import numpy as np
T = np.array([[1,3,2],[5,5,6],[1,8,3]])
P = np.cumsum(np.cumsum(T, axis=0), axis=1)
print(P)
# P is then, 
# [[1, 4, 6],
#  [6,14,22],
#  [7,23,34]]

My question is, how do I get T from P? I want to reconstruct T from P.

The above is a two-dimensional case, but I need an implementation where T and P can be any dimension.

The answer for Julia is provided: Inverse of cumsum in Julia

Still, I am seeking the answer for Python.

EDIT

Thank you for the nice answers. I have compared two suggested methods in terms of computational cost. Apparently, both methods are enough efficient.

T = np.random.randint(100, size=[100,50,25,50])
%timeit cumdiff(T) 
#319 ms ± 134 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit decumsum(T)
#324 ms ± 943 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 1

Views: 122

Answers (3)

Soudipta Dutta
Soudipta Dutta

Reputation: 2152

Original : [[ 10 100 1000] [ 50 50 60] [ 1 2 3]]

Final :

np.array([[10,110,1110],[60,210,1270],[61,213,1276]])

Now, from Final to Original :

Method 1 :

import numpy as np


P = np.array([[10,110,1110],[60,210,1270],[61,213,1276]])

# Step 1: Apply inverse of cumsum on axis 1 (horizontal axis)
T_reconstructed = np.diff(P, axis=1, prepend=0)
    
# Step 2: Apply inverse of cumsum on axis 0 (vertical axis)
T_reconstructed = np.diff(T_reconstructed, axis=0, prepend=0)
print(T_reconstructed)
'''
[[  10  100 1000]
 [  50   50   60]
 [   1    2    3]]
'''

Upvotes: 1

simon
simon

Reputation: 5531

You can use numpy.diff() with prepend=0 to produce the cumdiff function below (likewise, the cumsum function below generalizes your creation of P to arbitrarily many dimensions):

import numpy as np

def cumsum(a):
    s = np.copy(a)
    for dim in range(a.ndim):
        np.cumsum(s, axis=dim, out=s)
    return s

def cumdiff(a):
    d = a
    for dim in range(a.ndim):
        d = np.diff(d, axis=dim, prepend=0)
    return d

ndim = np.random.randint(1, 5)  # Test with max. 4 dimensions
shape = np.random.randint(1, 10, size=ndim)  # Test with max. 9 values per dimension
T = np.random.randint(100, size=shape)  # Test with a maximum value of 99 per element
P = cumsum(T)
assert (T == cumdiff(P)).all()

Intuitively it makes more sense to write for dim in range(a.ndim-1, -1, -1): ... in cumdiff(), and thus start subtracting in the highest axis, as is done in mozway's answer. Mathematically, the order of axes does not make a difference. Numerically I am not sure, and it thus might be worth testing.

Upvotes: 2

mozway
mozway

Reputation: 262244

You can use diff and hstack/vstack:

tmp = np.hstack([P[:, [0]], np.diff(P)])
T2 = np.vstack([tmp[[0]], np.diff(tmp, axis=0)])

Variant with :

import pandas as pd

tmp = pd.DataFrame(P).diff(axis=1).fillna(df)
T2 = tmp.diff(axis=0).fillna(tmp).to_numpy()

Output:

array([[1, 3, 2],
       [5, 5, 6],
       [1, 8, 3]])

Intermediate tmp:

array([[ 1,  3,  2],
       [ 6,  8,  8],
       [ 7, 16, 11]])

arbitrary number of dimensions:

We can use the same logic for each dimension, with concatenate

def decumsum(a):
    for x in range(a.ndim-1, -1, -1):
        a = np.concatenate([a[(slice(None),)*x+([0],)],
                            np.diff(a, axis=x)], axis=x)
    return a

T2 = decumsum(P)

Validation:

np.random.seed(0)
T = np.random.randint(0, 10, size=(3,4,5,6))
P = T.copy()
for x in range(T.ndim):
    P = np.cumsum(P, axis=x)

np.allclose(T, decumsum(P)) # True

Upvotes: 2

Related Questions