user7038639
user7038639

Reputation: 89

Summing Numpy Boolean Indices

I'm trying to find a way to sum an array of values based off of a boolean index, using a modulo function to determine month beginning/end.

months = np.arange(36) + 1 # +1 to denote months rather than index
vals = np.ones(36)
vals[12:24] = 2
vals[24:36] = 3

# closest try:

vals.cumsum()[[months % 12 == 0]] # returns array([12, 36, 72])

# target result = array([12, 24, 36])

The vals.sum() function just sums the whole thing, but cumsum accumulates over the whole thing, which isn't quite what I'm looking for. Target result is included above - this is a common spreadsheet summarization technique that would usually be done using a SUMIF function to sum values according to certain parameters.

Is there an easy way to do this? I'm sure there is, I'm just missing it and I've put a bit of time trying to get this figured - would prefer not to use a for loop.

Thanks.

Upvotes: 1

Views: 353

Answers (2)

akuiper
akuiper

Reputation: 214957

Seems you need np.add.reduceat:

np.add.reduceat(vals, np.flatnonzero((months - 1) % 12 == 0))
# array([ 12.,  24.,  36.])

Explanations:

months
# array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
#        18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
#        35, 36])

1). Use modulo to find out where the condition where the sum should start by (months - 1) % 12:

(months - 1) % 12 == 0
# array([ True, False, False, False, False, False, False, False, False,
#        False, False, False,  True, False, False, False, False, False,
#        False, False, False, False, False, False,  True, False, False,
#        False, False, False, False, False, False, False, False, False], dtype=bool)

2). np.flatnonzero is similar to np.where and gives the indices, so here, the first sum starts from 0 till 12 (exclusive), etc:

np.flatnonzero((months - 1) % 12 == 0)
array([ 0, 12, 24])

3). After finding out the indices, use np.add.reduceat to sum up the segments:

np.add.reduceat(vals, [0, 12, 24])
# array([ 12.,  24.,  36.])

Essentially, this is equivalent to [sum(vals[0:12]), sum(vals[12:24]), sum(vals[24:])] and gives the output you need.

Upvotes: 2

Ryan Tam
Ryan Tam

Reputation: 855

np.sum(vals[np.where(months % 12 == 0)[0]]) maybe?

np.where is used to select the indices.

Upvotes: 1

Related Questions