Reputation: 8072
I have this nice pandas dataframe:
And I want to group it by the column "0" (which represents the year) and calculate the mean of the other columns for each year. I do such thing with this code:
df.groupby(0)[2,3,4].mean()
And that successfully calculates the mean of every column. The problem here being the empty row that appears on top:
Upvotes: 0
Views: 977
Reputation: 394101
That's just a display thing, the grouped column now becomes the index and this is just the way that it is displayed, you will notice here that even when you set pd.set_option('display.notebook_repr_html', False)
you still get this line, it has no effect on operations on the goruped df:
In [30]:
df = pd.DataFrame({'a':np.random.randn(5), 'b':np.random.randn(5), 'c':np.arange(5)})
df
Out[30]:
a b c
0 0.766706 -0.575700 0
1 0.594797 -0.966856 1
2 1.852405 1.003855 2
3 -0.919870 -1.089215 3
4 -0.647769 -0.541440 4
In [31]:
df.groupby('c')['a','b'].mean()
Out[31]:
a b
c
0 0.766706 -0.575700
1 0.594797 -0.966856
2 1.852405 1.003855
3 -0.919870 -1.089215
4 -0.647769 -0.541440
Technically speaking it has assigneed the name
attribute:
In [32]:
df.groupby('c')['a','b'].mean().index.name
Out[32]:
'c'
by default there will be no name if it has not been assigned:
In [34]:
print(df.index.name)
None
Upvotes: 1