Reputation: 73
I have a data frame like this:
pk_dcdata threshold last_ep diff
window
1 11075761 0.00001 4 3
1 11075768 0.00001 7 6
2 11075769 0.00001 1 -1
2 11075770 0.00001 1 -1
3 11075771 0.00001 1 0
3 11075768 0.00001 7 6
I want to calculate the mean in the column 'diff' but compare with the index 'window', and save the mean into a new list. e.g. window = 1 and the mean is (3+6)/2, and the next is window = 2, so (-1-1)/2 and so on.
Expected outcome: list = [4.5,-1,3]
I tried to use 'rolling_mean' but don't know how to set the moving length. Because the dataset is big, hope can get a fast way to get the result.
Upvotes: 2
Views: 89
Reputation: 484
You can use groupby()
: let's say your dataframe is called df
avg_diff = df['diff'].groupby(level=0).mean()
This will provide you with a dataframe with the means based on the window
.
If then you want to put it in a list you can do like this:
my_list = avg.tolist()
Upvotes: 0
Reputation: 862511
Dont use list
as variable because python reserved word.
Need aggregate by mean
per index and last convert Series
to list
:
L = df.groupby(level=0)['diff'].mean().tolist()
#alternative
#L = df.groupby('window')['diff'].mean().tolist()
print (L)
[4.5, -1.0, 3.0]
Alternative working in pandas 0.20.0+
, check docs.
Upvotes: 2