Reputation: 23
Im currently trying to calculate the simple moving average on a dataset of several stocks. Im trying the code on just two companies (and 4 days time) for simplicity to get it working, but there seem to be some problem with the output. Below is my code.
for index, row in df3.iloc[4:].iterrows():
if df3.loc[index,'CompanyId'] == df3.loc[index-4,'CompanyId']:
df3['SMA4'] = df3.iloc[:,1].rolling(window=4).mean()
else:
df3['SMA4'] = 0
And here is the output:Output
The dataframe is sorted by date and company id. So what needs to happen is that when the company id are not equal as stated in the code, the output should be zero since i cant calculate a moving average of two different companies. Instead it output a moving average over both companies like at row 7,8,9.
Upvotes: 2
Views: 2286
Reputation: 2515
While ansev is right that you should use the specialized function because manual loops are much slower, I want to show why your code didn't work:
In both the if branch and the else branch, the entire SMA4 column gets assigned to (df3['SMA4']
), and because on the last run through the loop, the if statement is true, so the else statement doesn't have any effect and SMA4 is never 0. So to fix that you could first create the column populated with rolling averages (note that this is not in a for loop):
df3['SMA4'] = df3.iloc[:,1].rolling(window=4).mean()
And then you run the loop to set invalid rows to 0 (though nan would be better. I kept in the other bugs, assuming that the numbers in ansev's answer are correct):
for index, row in df3.iloc[4:].iterrows():
if df3.loc[index,'CompanyId'] != df3.loc[index-4,'CompanyId']:
df3.loc[index,'SMA4'] = 0
Output (probably still buggy):
CompanyId Price SMA4
0 1 75 NaN
1 1 74 NaN
2 1 77 NaN
3 1 78 76.00
4 1 80 77.25
5 1 79 78.50
6 1 80 79.25
7 2 10 0.00
8 2 9 0.00
9 2 12 0.00
10 2 11 0.00
11 2 11 10.75
12 2 8 10.50
13 2 9 9.75
14 2 8 9.00
15 2 8 8.25
16 2 11 9.00
Upvotes: 0
Reputation: 30920
Use groupby.rolling
df['SMA4']=df.groupby('CompanyId',sort=False).rolling(window=4).Price.mean().reset_index(drop='CompanyId')
print(df)
CompanyId Price SMA4
0 1 75 NaN
1 1 74 NaN
2 1 77 NaN
3 1 78 76.00
4 1 80 77.25
5 1 79 78.50
6 1 80 79.25
7 0 10 NaN
8 0 9 NaN
9 0 12 NaN
10 0 11 10.50
11 0 11 10.75
12 0 8 10.50
13 0 9 9.75
14 0 8 9.00
15 0 8 8.25
16 0 11 9.00
Upvotes: 1