slaw
slaw

Reputation: 6869

Prepend Zero to Long Numpy Array

I have a very long 1D array that I'd like to calculate the cumulative sum for and then prepend a zero at the beginning of the resultant array.

import numpy as np

def padded_cumsum(x):
    cum_sum_x = np.cumsum(x)  # Creating a new array here is fine

    return np.pad(cum_sum_x, (1,0), mode='constant')  # Prepend a single zero w/ pad

x = np.array(range(2**25))
print padded_cumsum(x)

The function padded_cumsum will be called billions of times with varying array lengths and so I am trying to avoid any array copying as that is expensive. Also, I cannot alter the original array x by instantiating it with extra values/NaNs at the beginning/end. Since cum_sum_x needs to be created anyways, I suspect that I can sneak in the zero there by doing something hacky like:

def padded_cumsum(x):
    cum_sum_x = np.empty(x.shape[0]+1)
    cum_sum_x[0] = 0 
    cum_sum_x[1:] = np.cumsum(x)

    return np.pad(cum_sum_x, (1,0), mode='constant')  # Prepend a single zero w/ pad

Upvotes: 4

Views: 1823

Answers (4)

neuro630
neuro630

Reputation: 81

Here's another option:

x = np.roll(x.cumsum(), 1)
x[0] = 0

Using the benchmark script in Markus' answer, I find that its performance is between np.r_ and np.hstack. But this seems the fastest option if you are using PyTorch rather than Numpy.

Upvotes: 0

Markus Dutschke
Markus Dutschke

Reputation: 10606

Performance benchmark

I compared a few options, mainly coming from here. np.concatenate(([0], x)).cumsum() is the fastest.

result

enter image description here

x: problem size, y: computing time for 1000 runs

code

import timeit
import random
import numpy as np
import matplotlib.pyplot as plt

cmds = [
    'np.r_[[0], x].cumsum()',
    'np.hstack(([0], x)).cumsum()',
    'np.concatenate(([0], x)).cumsum()',
    'csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])',
    ]
test_range = [1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6]
# test_range = [1e0, 1e1, 1e2]

ts = np.empty((len(cmds), len(test_range)), dtype=float)
for tt, size_float in enumerate(test_range):
    size = round(size_float)
    print('array size:', size)
    x = np.random.randint(low=0, high=100, size=size)

    n_trials = 1000
    for cc, cmd in enumerate(cmds):

        t = timeit.Timer(cmd, globals={**globals(), **locals()})
        t = t.timeit(n_trials)
        ts[cc, tt] = t
        print('time for {:d}x \"{:}\": {:.6f}'.format(n_trials, cmd, t))


fig, ax = plt.subplots(1, 1, figsize=(15, 10))
for cc, cmd in enumerate(cmds):
    ax.plot(test_range, ts[cc, :], label=cmd)
    print(cmd)
ax.legend()
ax.set_xscale('log')
ax.set_yscale('log')

output

array size: 1
time for 1000x "np.r_[[0], x].cumsum()": 0.015609
time for 1000x "np.hstack(([0], x)).cumsum()": 0.005469
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.002997
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.003499
array size: 10
time for 1000x "np.r_[[0], x].cumsum()": 0.018170
time for 1000x "np.hstack(([0], x)).cumsum()": 0.005663
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.002993
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.003511
array size: 100
time for 1000x "np.r_[[0], x].cumsum()": 0.018444
time for 1000x "np.hstack(([0], x)).cumsum()": 0.005621
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.003145
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.003770
array size: 1000
time for 1000x "np.r_[[0], x].cumsum()": 0.018034
time for 1000x "np.hstack(([0], x)).cumsum()": 0.007816
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.005275
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.006885
array size: 10000
time for 1000x "np.r_[[0], x].cumsum()": 0.036433
time for 1000x "np.hstack(([0], x)).cumsum()": 0.027001
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.024336
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.034565
array size: 100000
time for 1000x "np.r_[[0], x].cumsum()": 0.228152
time for 1000x "np.hstack(([0], x)).cumsum()": 0.219081
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.215639
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.311723
array size: 1000000
time for 1000x "np.r_[[0], x].cumsum()": 2.693319
time for 1000x "np.hstack(([0], x)).cumsum()": 2.656931
time for 1000x "np.concatenate(([0], x)).cumsum()": 2.634273
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 3.755322
np.r_[[0], x].cumsum()
np.hstack(([0], x)).cumsum()
np.concatenate(([0], x)).cumsum()
csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])

Upvotes: 0

B. M.
B. M.

Reputation: 18628

You can cumsum in place :

def padcumsum(x):
    csum=np.hstack((0,x)) # 3x faster than pad.
    csum.cumsum(out=csum)
    return csum

For performance issue, you can insall numba :

@numba.njit
def perf(x):
    csum=np.empty(x.size+1,x.dtype)
    csum[0]=0
    for i in range(x.size):
        csum[i+1]=csum[i]+x[i]
    return csum

which is two times faster than padcumsum

Upvotes: 2

Paul Panzer
Paul Panzer

Reputation: 53029

Use the out keyword on a hand-allocated array.

out = np.empty(len(x)+pad, dtype=yourdtype)
np.cumsum(x, out=out[pad:])
out[:pad] = 0

Upvotes: 2

Related Questions