Reputation: 3129
Is there a way to omit some of the output from the pandas describe? This command gives me exactly what I want with a table output (count and mean of executeTime's by a simpleDate)
df.groupby('simpleDate').executeTime.describe().unstack(1)
However that's all I want, count and mean. I want to drop std, min, max, etc... So far I've only read how to modify column size.
I'm guessing the answer is going to be to re-write the line, not using describe, but I haven't had any luck grouping by simpleDate and getting the count with a mean on executeTime.
I can do count by date:
df.groupby(['simpleDate']).size()
or executeTime by date:
df.groupby(['simpleDate']).mean()['executeTime'].reset_index()
But can't figure out the syntax to combine them.
My desired output:
count mean
09-10-2013 8 20.523
09-11-2013 4 21.112
09-12-2013 3 18.531
... .. ...
Upvotes: 30
Views: 75866
Reputation: 29
Why do you want to use describe
in first hand and generating more than you need to just discard it? Just generate agg
instead and get directly what you want:
df.groupby('simpleDate').executeTime.agg(['count','max'])
Upvotes: 1
Reputation: 3023
.describe()
attribute generates a Dataframe where count
, std
, max
... are values of the index, so according to the documentation you should use .loc
to retrieve just the index values desired:
df.describe().loc[['count','max']]
Upvotes: 48
Reputation: 466
Looking at the answers, I don't see one that actually works on a DataFrame returned from describe() after using groupby().
The documentation on MultiIndex selection gives a hint at the answer. The .xs() function works for one but not multiple selections, but .loc works.
df.groupby(['simpleDate']).describe().loc[:,(slice(None),['count','max'])]
This keeps the nice MultiIndex returned by .describe() but with only the columns selected.
Upvotes: 21
Reputation: 11
Sticking with describe, you can unstack the indexes and then slice normally too
df.describe().unstack()[['count','max']]
Upvotes: 1
Reputation: 629
The solution @Jeff provided just works for series.
@Rafa is on the point: df.describe().info()
reveals that the resulting dataframe has Index: 8 entries, count to max
df.describe().loc[['count','max']]
does work, but df.groupby('simpleDate').describe().loc[['count','max']]
, which is what the OP asked, does not work.
I think a solution may be this:
df = pd.DataFrame({'Y': ['A', 'B', 'B', 'A', 'B'],
'Z': [10, 5, 6, 11, 12],
})
grouping the df by Y
:
df_grouped=df.groupby(by='Y')
In [207]df_grouped.agg([np.mean, len])
Out[207]:
Z
mean len
Y
A 10.500 2
B 7.667 3
Upvotes: 6
Reputation: 129018
Describe returns a series, so you can just select out what you want
In [6]: s = Series(np.random.rand(10))
In [7]: s
Out[7]:
0 0.302041
1 0.353838
2 0.421416
3 0.174497
4 0.600932
5 0.871461
6 0.116874
7 0.233738
8 0.859147
9 0.145515
dtype: float64
In [8]: s.describe()
Out[8]:
count 10.000000
mean 0.407946
std 0.280562
min 0.116874
25% 0.189307
50% 0.327940
75% 0.556053
max 0.871461
dtype: float64
In [9]: s.describe()[['count','mean']]
Out[9]:
count 10.000000
mean 0.407946
dtype: float64
Upvotes: 33