Reputation: 684
I would like to know how to perform the inverse operation of cumulative sum on a multi-dimensional array in Python.
For example, we can get cumulative array P
of a given 2D array T
by
import numpy as np
T = np.array([[1,3,2],[5,5,6],[1,8,3]])
P = np.cumsum(np.cumsum(T, axis=0), axis=1)
print(P)
# P is then,
# [[1, 4, 6],
# [6,14,22],
# [7,23,34]]
My question is, how do I get T
from P
? I want to reconstruct T
from P
.
The above is a two-dimensional case, but I need an implementation where T
and P
can be any dimension.
The answer for Julia is provided: Inverse of cumsum in Julia
Still, I am seeking the answer for Python.
EDIT
Thank you for the nice answers. I have compared two suggested methods in terms of computational cost. Apparently, both methods are enough efficient.
T = np.random.randint(100, size=[100,50,25,50])
%timeit cumdiff(T)
#319 ms ± 134 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit decumsum(T)
#324 ms ± 943 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1
Views: 122
Reputation: 2152
Original : [[ 10 100 1000] [ 50 50 60] [ 1 2 3]]
Final :
np.array([[10,110,1110],[60,210,1270],[61,213,1276]])
Now, from Final to Original :
Method 1 :
import numpy as np
P = np.array([[10,110,1110],[60,210,1270],[61,213,1276]])
# Step 1: Apply inverse of cumsum on axis 1 (horizontal axis)
T_reconstructed = np.diff(P, axis=1, prepend=0)
# Step 2: Apply inverse of cumsum on axis 0 (vertical axis)
T_reconstructed = np.diff(T_reconstructed, axis=0, prepend=0)
print(T_reconstructed)
'''
[[ 10 100 1000]
[ 50 50 60]
[ 1 2 3]]
'''
Upvotes: 1
Reputation: 5531
You can use numpy.diff()
with prepend=0
to produce the cumdiff
function below (likewise, the cumsum
function below generalizes your creation of P
to arbitrarily many dimensions):
import numpy as np
def cumsum(a):
s = np.copy(a)
for dim in range(a.ndim):
np.cumsum(s, axis=dim, out=s)
return s
def cumdiff(a):
d = a
for dim in range(a.ndim):
d = np.diff(d, axis=dim, prepend=0)
return d
ndim = np.random.randint(1, 5) # Test with max. 4 dimensions
shape = np.random.randint(1, 10, size=ndim) # Test with max. 9 values per dimension
T = np.random.randint(100, size=shape) # Test with a maximum value of 99 per element
P = cumsum(T)
assert (T == cumdiff(P)).all()
Intuitively it makes more sense to write for dim in range(a.ndim-1, -1, -1): ...
in cumdiff()
, and thus start subtracting in the highest axis, as is done in mozway's answer. Mathematically, the order of axes does not make a difference. Numerically I am not sure, and it thus might be worth testing.
Upvotes: 2
Reputation: 262244
You can use diff
and hstack
/vstack
:
tmp = np.hstack([P[:, [0]], np.diff(P)])
T2 = np.vstack([tmp[[0]], np.diff(tmp, axis=0)])
Variant with pandas:
import pandas as pd
tmp = pd.DataFrame(P).diff(axis=1).fillna(df)
T2 = tmp.diff(axis=0).fillna(tmp).to_numpy()
Output:
array([[1, 3, 2],
[5, 5, 6],
[1, 8, 3]])
Intermediate tmp
:
array([[ 1, 3, 2],
[ 6, 8, 8],
[ 7, 16, 11]])
We can use the same logic for each dimension, with concatenate
def decumsum(a):
for x in range(a.ndim-1, -1, -1):
a = np.concatenate([a[(slice(None),)*x+([0],)],
np.diff(a, axis=x)], axis=x)
return a
T2 = decumsum(P)
Validation:
np.random.seed(0)
T = np.random.randint(0, 10, size=(3,4,5,6))
P = T.copy()
for x in range(T.ndim):
P = np.cumsum(P, axis=x)
np.allclose(T, decumsum(P)) # True
Upvotes: 2