Erkan
Erkan

Reputation: 23

Calculating simple moving average pandas for loop

Im currently trying to calculate the simple moving average on a dataset of several stocks. Im trying the code on just two companies (and 4 days time) for simplicity to get it working, but there seem to be some problem with the output. Below is my code.

for index, row in df3.iloc[4:].iterrows():
    if df3.loc[index,'CompanyId'] == df3.loc[index-4,'CompanyId']:
        df3['SMA4'] = df3.iloc[:,1].rolling(window=4).mean()
    else:
        df3['SMA4'] = 0

And here is the output:Output

The dataframe is sorted by date and company id. So what needs to happen is that when the company id are not equal as stated in the code, the output should be zero since i cant calculate a moving average of two different companies. Instead it output a moving average over both companies like at row 7,8,9.

Upvotes: 2

Views: 2286

Answers (2)

Johannes Riecken
Johannes Riecken

Reputation: 2515

While ansev is right that you should use the specialized function because manual loops are much slower, I want to show why your code didn't work: In both the if branch and the else branch, the entire SMA4 column gets assigned to (df3['SMA4']), and because on the last run through the loop, the if statement is true, so the else statement doesn't have any effect and SMA4 is never 0. So to fix that you could first create the column populated with rolling averages (note that this is not in a for loop):

df3['SMA4'] = df3.iloc[:,1].rolling(window=4).mean()

And then you run the loop to set invalid rows to 0 (though nan would be better. I kept in the other bugs, assuming that the numbers in ansev's answer are correct):

for index, row in df3.iloc[4:].iterrows(): 
    if df3.loc[index,'CompanyId'] != df3.loc[index-4,'CompanyId']: 
        df3.loc[index,'SMA4'] = 0 

Output (probably still buggy):

    CompanyId  Price   SMA4
0           1     75    NaN
1           1     74    NaN
2           1     77    NaN
3           1     78  76.00
4           1     80  77.25
5           1     79  78.50
6           1     80  79.25
7           2     10   0.00
8           2      9   0.00
9           2     12   0.00
10          2     11   0.00
11          2     11  10.75
12          2      8  10.50
13          2      9   9.75
14          2      8   9.00
15          2      8   8.25
16          2     11   9.00

Upvotes: 0

ansev
ansev

Reputation: 30920

Use groupby.rolling

df['SMA4']=df.groupby('CompanyId',sort=False).rolling(window=4).Price.mean().reset_index(drop='CompanyId')
print(df)

    CompanyId  Price   SMA4
0           1     75    NaN
1           1     74    NaN
2           1     77    NaN
3           1     78  76.00
4           1     80  77.25
5           1     79  78.50
6           1     80  79.25
7           0     10    NaN
8           0      9    NaN
9           0     12    NaN
10          0     11  10.50
11          0     11  10.75
12          0      8  10.50
13          0      9   9.75
14          0      8   9.00
15          0      8   8.25
16          0     11   9.00

Upvotes: 1

Related Questions