Reputation: 80
I am working on quite a large DataFrame and would like to speed things up. My data is in the following format:
Name | Number |
---|---|
Condition 1 | 1 |
Condition 1 | 2 |
Condition 1 | 2.2 |
Condition 1 | 2 |
Condition 2 | 1 |
Condition 2 | 1.1 |
Condition 2 | 1.2 |
Condition 2 | 1.3 |
Now I want to apply a rolling mean over the numbers, but without mixing different conditions. My expected outcome would be:
Name | Number | Mean |
---|---|---|
Condition 1 | 1 | nan |
Condition 1 | 2 | 1.5 |
Condition 1 | 2.2 | 2.1 |
Condition 1 | 2 | 2.1 |
Condition 2 | 1 | nan |
Condition 2 | 1.1 | 1.05 |
Condition 2 | 1.2 | 1.15 |
Condition 2 | 1.3 | 1.25 |
What I am doing right now is to iterate over all conditions and save the calculated mean in a different Dataframe, like:
mean = {}
for con in data["Name"].drop_duplicates():
mean[con] = data[data["Name" == con]].rolling(window=2).mean()
But this always takes several minutes, that's why I am looking for an alternative to create the column directly. I tried it with .apply but that doesn't work. At least not with what I tried so far.
Thank you very much!
Upvotes: 0
Views: 132
Reputation: 61910
Group by Name and then transform Number with the rolling mean:
df['Mean'] = df.groupby('Name')['Number'].transform(lambda x: x.rolling(window=2).mean())
print(df)
Output
Name Number Mean
0 Condition 1 1.0 NaN
1 Condition 1 2.0 1.50
2 Condition 1 2.2 2.10
3 Condition 1 2.0 2.10
4 Condition 2 1.0 NaN
5 Condition 2 1.1 1.05
6 Condition 2 1.2 1.15
7 Condition 2 1.3 1.25
Upvotes: 2