Reputation: 198
I'm new with Python so, I think this is just a basic but I can not find it. I have this kind of data frame which consist of 100 question
|Date|QID|Time_1|Answer_1|Time_2|Answer_2|Time_3|Answer_3|
|1/12|001|20 | A | 30 | A | 34 | D |
|1/12|001|22 | A | 10 | A | 12 | D |
|1/12|002|27 | B | 40 | A | 45 | D |
|1/12|002|25 | A | 60 | C | 23 | D |
So, I want the descriptive statistic for a time such as max, min, mean for overall time.
Such in the sample data, this is the output:
Output
|QID| Mean | Min | Max |
|001| 21.33| 10 | 34 |
|002| 36.67| 23 | 60 |
How can I do that?
I have use
df.mean(axis=1)
df.max(axis=1)
df.min(axis=1)
But how to find the descriptive statistic using group by based on the QID.
Thank you in advance.
Upvotes: 3
Views: 5034
Reputation: 863741
Main complication of solution is not exist mean
of mean
s, so is necessary create mean
by definition - it is sum of sum
s divided by sum of count
s.
So first get aggregations per rows with DataFrame.agg
, instead mean
use sum
and DataFrame.size
and then aggregate sum
, min
and max
, last divide columns for mean
:
cols = df.filter(like='Time').columns
df1 = df[cols].agg(['sum','size','min','max'], axis=1)
df = df1.groupby(df['QID']).agg(m1=('sum','sum'),
m2=('size','sum'),
Min=('min','min'),
Max=('max','max'))
df = df.assign(Mean=df.pop('m1').div(df.pop('m2'))).reset_index()
print (df)
QID Min Max Mean
0 1 10 34 21.333333
1 2 23 60 36.666667
Upvotes: 4
Reputation: 2884
Use df.describe()
it yields all of the information you're after. .describe()
is a DataFrame method so you can stick it at the end of any GroupBy statement that returns a DataFrame
object.
Here are the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html
Upvotes: 0