FrahChan04
FrahChan04

Reputation: 198

How to find descriptive statistics using groupby in pandas dataframe

I'm new with Python so, I think this is just a basic but I can not find it. I have this kind of data frame which consist of 100 question

|Date|QID|Time_1|Answer_1|Time_2|Answer_2|Time_3|Answer_3|
|1/12|001|20    |  A     |  30  |   A    |  34  |   D    |
|1/12|001|22    |  A     |  10  |   A    |  12  |   D    |
|1/12|002|27    |  B     |  40  |   A    |  45  |   D    |
|1/12|002|25    |  A     |  60  |   C    |  23  |   D    |

So, I want the descriptive statistic for a time such as max, min, mean for overall time.

Such in the sample data, this is the output:

Output

|QID| Mean | Min | Max |
|001| 21.33| 10  |  34 |
|002| 36.67| 23  |  60 |

How can I do that?

I have use

df.mean(axis=1)
df.max(axis=1)
df.min(axis=1)

But how to find the descriptive statistic using group by based on the QID.

Thank you in advance.

Upvotes: 3

Views: 5034

Answers (2)

jezrael
jezrael

Reputation: 863741

Main complication of solution is not exist mean of means, so is necessary create mean by definition - it is sum of sums divided by sum of counts.

So first get aggregations per rows with DataFrame.agg, instead mean use sum and DataFrame.size and then aggregate sum, min and max, last divide columns for mean:

cols = df.filter(like='Time').columns

df1 = df[cols].agg(['sum','size','min','max'], axis=1)
df = df1.groupby(df['QID']).agg(m1=('sum','sum'),
                                m2=('size','sum'),
                                Min=('min','min'),
                                Max=('max','max'))
df = df.assign(Mean=df.pop('m1').div(df.pop('m2'))).reset_index()
print (df)
   QID  Min  Max       Mean
0    1   10   34  21.333333
1    2   23   60  36.666667

Upvotes: 4

vencaslac
vencaslac

Reputation: 2884

Use df.describe() it yields all of the information you're after. .describe() is a DataFrame method so you can stick it at the end of any GroupBy statement that returns a DataFrame object.

Here are the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html

Upvotes: 0

Related Questions