Ishan Dutta
Ishan Dutta

Reputation: 957

How to aggregate values of a Dataframe by mean in Python?

I have the following dataframe consisting of the number of posts made by a user in a duration of 2 weeks (days from -7 to 7). I want to create another dataframe that should have the mean number of posts made per day. I have written the following code but it returns me a series with 1 column instead of a Dataframe. The required Dataframe should have 2 separate columns for day and mean.

Part of Dataframe (df)

UserId          Date                -7  -6  -5  -4  -3  -2  -1  0   1   2   3   4   5   6   7
87      2011-05-10 18:38:55.030     0   0   0   0   0   0   1   0   0   0   0   0   0   0   0
487     2011-11-29 14:46:12.080     0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
21      2012-03-02 14:35:06.867     0   1   0   1   2   0   2   2   0   1   2   2   1   3   1

CODE (To obtain mean posts per day)

df.iloc[:,2:].mean()

Code Output

-7  0
-6  0.33
-5  0.33
-4  0.33
-3  0.66
-2  0
-1  1
0   0.66
1   0
2   0.33
3   0.66
4   0.66
5   0.33
6   1
7   0.33

This output is correct, the only problem is that it is a series. The expected output should have 2 separate columns for day and mean as shown.

Expected Output

day mean
-7  0
-6  0.33
-5  0.33
-4  0.33
-3  0.66
-2  0
-1  1
0   0.66
1   0
2   0.33
3   0.66
4   0.66
5   0.33
6   1
7   0.33

Upvotes: 0

Views: 103

Answers (1)

jezrael
jezrael

Reputation: 863531

Use Series.rename_axis with Series.reset_index, so set new columns names is not necessary:

df1 = df.iloc[:,2:].mean().rename_axis('day').reset_index(name='mean')
print (df1)
   day      mean
0   -7  0.000000
1   -6  0.333333
2   -5  0.333333
3   -4  0.333333
4   -3  0.666667
5   -2  0.000000
6   -1  1.000000
7    0  0.666667
8    1  0.000000
9    2  0.333333
10   3  0.666667
11   4  0.666667
12   5  0.333333
13   6  1.000000
14   7  0.333333

EDIT: Working with seaborn 11:

sns.lineplot(data=df1, x = 'day', y = 'mean', err_style="bars",ci=68)

Upvotes: 1

Related Questions