muon
muon

Reputation: 14057

numpy cumsum columns for varying lengths specified by list

how to cumsum n consecutive elements in each column specified by a list. cumsum resets from next row onwards. For example, lens = [2,3] ==> cumsum of first 2 rows [:2], then cumsum for next 3 rows [2:5]

import numpy as np
lens = [3, 2]
a = np.array(
       [[ 1,  2],
        [ 1,  2],
        [ 1,  2],
        [ 1,  2],
        [ 1,  2]])

giving

np.array(
       [[ 1,  2],
        [ 2,  4],
        [ 3,  6],
        [ 1,  2],
        [ 2,  4]])

trying to avoid loops

Upvotes: 0

Views: 396

Answers (1)

akuiper
akuiper

Reputation: 215047

One option is split the array, cumsum and then combine them:

np.concatenate(list(map(lambda a: np.cumsum(a, axis=0), np.array_split(a, np.cumsum(lens)))))
#array([[1, 2],
#       [2, 4],
#       [3, 6],
#       [1, 2],
#       [2, 4]], dtype=int32)

Another option without split and combine is to create an auxiliary array that reset the sum at specific index like below:

idx = np.cumsum([0] + lens)[:-1]
aux = np.zeros_like(a)
aux[idx[1:], :] = -np.add.reduceat(a, idx)[:-1]
(a + aux).cumsum(0)

#array([[1, 2],
#       [2, 4],
#       [3, 6],
#       [1, 2],
#       [2, 4]], dtype=int32)

The two methods are about the same speed:

def split_concat(a):
    return np.concatenate(list(map(lambda a: np.cumsum(a, axis=0), np.array_split(a, np.cumsum(lens)))))

def reset_sum(a):
    idx = np.cumsum([0] + lens)[:-1]
    aux = np.zeros_like(a)
    aux[idx[1:], :] = -np.add.reduceat(a, idx)[:-1]
    return (a + aux).cumsum(0)


lens = np.arange(1000)
a = np.ones((lens.sum(), 2))
(reset_sum(a) == split_concat(a)).all()
# True

%timeit split_concat(a)
# 12.8 ms ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit reset_sum(a)
# 13.6 ms ± 87.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 3

Related Questions