Joe
Joe

Reputation: 79

Computing sum of consecutive values in a vector that are greater than a constant number?

I couldn't summarize my question in the title very well. I'm writing a code and in one part of the code I need to compute the following:

Let's say we have a vector (e.g. a numpy array):

a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]

We want to turn any number greater than 5 to 5:

a = [3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]

Then we compute the sum of consecutive 5s and the number that follows them and replace all these elements with the resulting sum:

a = [3.2, 4, 5+ 2, 5+ 5+ 5+ 1.7, 2, 5+ 5+ 1, 3]

so the resulting array would be:

a = [3.2, 4, 7, 16.7, 2, 11, 3]

I can do this using a for loop like this:

    indx = np.where(a>5)[0]
    a[indx] = 5
    counter = 0
    c = []
    while (counter < len(a)):
        elem = a[counter]
        if elem ~= 5:
            c.append(elem)
        else:
            temp = 0
            while(elem==5):
                temp += elem
                counter +=1
                elem = a[counter]
            temp += elem
            c.append(temp)
        counter += 1

Is there a way to avoid using the for loop? Perhaps by using the indx variable?

I have a vague idea if we turn it into a string: a = '[3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]' and then change anywhere we have ' 5,' with ' 5+' and then use eval(a). However, is there an efficient way to find all indices containing a sub-string? How about the fact that strings are immutable?

Upvotes: 2

Views: 125

Answers (3)

Julien
Julien

Reputation: 15071

This is what you want (all in vectorized numpy):

import numpy as np

a = np.array([0, 3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3, 0]) # add a 0 at the beginning and the end
aa = np.where(a>5, 5, a) # clip values to 5, can use np.clip(a, None, 5) too...
c = np.cumsum(aa) # get cumulative sum
np.diff(c[aa < 5]) # only keep values where original array is less than 5, then diff again

array([ 3.2,  4. ,  7. , 16.7,  2. , 11. ,  3. ])

Upvotes: 2

rafaelc
rafaelc

Reputation: 59264

You can use pandas for data manipulation, using cumsum and shift to groupby your values with your logic, and aggregating it with sum

df = pd.DataFrame(a, columns=['col1'])
df.loc[df.col1 > 5] = 5
s = df.col1.groupby((df.col1 != 5).cumsum().shift().fillna(0)).sum()

col1
0.0     3.2
1.0     4.0
2.0     7.0
3.0    16.7
4.0     2.0
5.0    11.0
6.0     3.0

To get a numpy back, just get .values

>>> s.values
array([  3.2,   4. ,   7. ,  16.7,   2. ,  11. ,   3. ])

Upvotes: 2

chris
chris

Reputation: 2063

I think you can do this in a single pass. For each item:

  • if the value is 5 or more, don't append it to the list immediately, "defer" a 5 for now
  • if the value is less than 5, add it to all of the "deferred" 5's and append the sum

.

a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]
result = []

current_sum = 0
for item in a:
    if item < 5:
        result.append(current_sum + item)
        current_sum = 0
    else:
        current_sum += 5

if current_sum:
    result.append(current_sum)

>>> result
[3.2, 4, 7, 16.7, 2, 11, 3]

Upvotes: 1

Related Questions