SHENGNAN LI
SHENGNAN LI

Reputation: 73

python pandas calculate a mean

I have a data frame like this:

        pk_dcdata     threshold   last_ep  diff
window                                                            
1        11075761       0.00001         4     3
1        11075768       0.00001         7     6
2        11075769       0.00001         1    -1
2        11075770       0.00001         1    -1
3        11075771       0.00001         1     0
3        11075768       0.00001         7     6

I want to calculate the mean in the column 'diff' but compare with the index 'window', and save the mean into a new list. e.g. window = 1 and the mean is (3+6)/2, and the next is window = 2, so (-1-1)/2 and so on.

Expected outcome: list = [4.5,-1,3]

I tried to use 'rolling_mean' but don't know how to set the moving length. Because the dataset is big, hope can get a fast way to get the result.

Upvotes: 2

Views: 89

Answers (2)

Gozy4
Gozy4

Reputation: 484

You can use groupby(): let's say your dataframe is called df

avg_diff = df['diff'].groupby(level=0).mean()

This will provide you with a dataframe with the means based on the window. If then you want to put it in a list you can do like this:

my_list = avg.tolist()

Upvotes: 0

jezrael
jezrael

Reputation: 862511

Dont use list as variable because python reserved word.

Need aggregate by mean per index and last convert Series to list:

L = df.groupby(level=0)['diff'].mean().tolist()
#alternative
#L = df.groupby('window')['diff'].mean().tolist()
print (L)
[4.5, -1.0, 3.0]

Alternative working in pandas 0.20.0+, check docs.

Upvotes: 2

Related Questions