Vectorization of loops in python

Question

I have the following code in Python:

import numpy as np
import pandas as pd
colum1 = [1,2,3,4,5,6,7,8,9,10,11,12]
colum2 = [10,20,30,40,50,60,70,80,90,100,110,120]

df = pd.DataFrame({
    'colum1' : colum1,
    'colum2' : colum2
});

df.loc[df.colum1 == 1,'result'] = df['colum2']

for i in range(len(colum2)):
    df.result = np.where(df.colum1>1, 5 - (df['colum2'] - df.result.shift(1)), df.result)

the result of df.result is:

    colum1  colum2  result
0        1      10    10.0
1        2      20    -5.0
2        3      30   -30.0
3        4      40   -65.0
4        5      50  -110.0
5        6      60  -165.0
6        7      70  -230.0
7        8      80  -305.0
8        9      90  -390.0
9       10     100  -485.0
10      11     110  -590.0
11      12     120  -705.0

I would like to know if there is a method that allows me to obtain the same result without using a cycle for

user3483203 · Accepted Answer

Your operation is dependent on two things, the previous row in the DataFrame, and the difference between consecutive values in the DataFrame. That hints that the solution will require shift and diff. However, you want to add a small constant to the expanding sum, as well as actually subtract this from each row, not add it.

To set the pieces of the problem up, first create your shifted series, where you add 5:

a = df.colum2.shift().add(5).cumsum().fillna(0)

Now you need the difference between elements in the Series, and fill missing results with their respective value in colum2:

b = df.colum2.diff().fillna(df.colum2)

To get your final result, simply subtract a from b:

b - a

0      10.0
1      -5.0
2     -30.0
3     -65.0
4    -110.0
5    -165.0
6    -230.0
7    -305.0
8    -390.0
9    -485.0
10   -590.0
11   -705.0
Name: colum2, dtype: float64

Vectorization of loops in python

Answers (1)

Related Questions