Reputation: 79
I couldn't summarize my question in the title very well. I'm writing a code and in one part of the code I need to compute the following:
Let's say we have a vector (e.g. a numpy array):
a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]
We want to turn any number greater than 5 to 5:
a = [3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]
Then we compute the sum of consecutive 5s and the number that follows them and replace all these elements with the resulting sum:
a = [3.2, 4, 5+ 2, 5+ 5+ 5+ 1.7, 2, 5+ 5+ 1, 3]
so the resulting array would be:
a = [3.2, 4, 7, 16.7, 2, 11, 3]
I can do this using a for loop like this:
indx = np.where(a>5)[0]
a[indx] = 5
counter = 0
c = []
while (counter < len(a)):
elem = a[counter]
if elem ~= 5:
c.append(elem)
else:
temp = 0
while(elem==5):
temp += elem
counter +=1
elem = a[counter]
temp += elem
c.append(temp)
counter += 1
Is there a way to avoid using the for loop? Perhaps by using the indx variable?
I have a vague idea if we turn it into a string:
a = '[3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]'
and then change anywhere we have ' 5,'
with ' 5+'
and then use eval(a)
. However, is there an efficient way to find all indices containing a sub-string? How about the fact that strings are immutable?
Upvotes: 2
Views: 125
Reputation: 15071
This is what you want (all in vectorized numpy):
import numpy as np
a = np.array([0, 3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3, 0]) # add a 0 at the beginning and the end
aa = np.where(a>5, 5, a) # clip values to 5, can use np.clip(a, None, 5) too...
c = np.cumsum(aa) # get cumulative sum
np.diff(c[aa < 5]) # only keep values where original array is less than 5, then diff again
array([ 3.2, 4. , 7. , 16.7, 2. , 11. , 3. ])
Upvotes: 2
Reputation: 59264
You can use pandas
for data manipulation, using cumsum
and shift
to groupby
your values with your logic, and aggregating it with sum
df = pd.DataFrame(a, columns=['col1'])
df.loc[df.col1 > 5] = 5
s = df.col1.groupby((df.col1 != 5).cumsum().shift().fillna(0)).sum()
col1
0.0 3.2
1.0 4.0
2.0 7.0
3.0 16.7
4.0 2.0
5.0 11.0
6.0 3.0
To get a numpy back, just get .values
>>> s.values
array([ 3.2, 4. , 7. , 16.7, 2. , 11. , 3. ])
Upvotes: 2
Reputation: 2063
I think you can do this in a single pass. For each item:
.
a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]
result = []
current_sum = 0
for item in a:
if item < 5:
result.append(current_sum + item)
current_sum = 0
else:
current_sum += 5
if current_sum:
result.append(current_sum)
>>> result
[3.2, 4, 7, 16.7, 2, 11, 3]
Upvotes: 1