qwerty
qwerty

Reputation: 887

"Describe (Pandas)" multiple variables as one variable

I have this data-frame with variables X1 - X5:

 X1  X2  X3  X4  X5
 14  52   2  76  81
 42  15  86  79  52
 40  96  90  87  51
 74  99   8  75  95
 40  25  52  16  24
 74  58  91   2   9
 56   5   6  36  37
 85  65  17   4   2
 88   6  42  19  11
  3   5  84  33  56

I want to use the describe() function (Pandas) treating those five variables as a one variable.

Expected result:

               X
count  50.000000
mean   45.300000
std    32.567826
min     2.000000
25%    14.250000
50%    42.000000
75%    75.750000
max    99.000000

Upvotes: 0

Views: 1058

Answers (4)

krxat
krxat

Reputation: 523

pd.concat([df[x] for x in df]).describe()

should do the trick. All the answers in this post will work but this one would be the fastest.

Upvotes: 1

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

Try df.values.flatten() to get the entire data into one column:

>>> pd.DataFrame(df.values.flatten(),columns=['X']).describe()
               X
count  50.000000
mean   45.300000
std    32.567826
min     2.000000
25%    14.250000
50%    42.000000
75%    75.750000
max    99.000000

Or, more easily:

>>> df.stack().describe()
               
count  50.000000
mean   45.300000
std    32.567826
min     2.000000
25%    14.250000
50%    42.000000
75%    75.750000
max    99.000000

But you will not get the column name as X in this case.

References:

  1. df.values
  2. np.array.flatten()
  3. df.stack()

Upvotes: 3

ipj
ipj

Reputation: 3598

Another solution using reshape:

pd.Series(df.values.reshape(-1)).describe()

result:

count    50.000000
mean     45.300000
std      32.567826
min       2.000000
25%      14.250000
50%      42.000000
75%      75.750000
max      99.000000
dtype: float64

Upvotes: 1

Daweo
Daweo

Reputation: 36630

I would use melt for that following way (for simplicity I use data with less numbers):

import pandas as pd
df = pd.DataFrame({'X1':[14,42,40],'X2':[52,15,96],'X3':[2,86,90]})
print(df.melt()['value'].describe())

Output:

count     9.000000
mean     48.555556
std      35.330975
min       2.000000
25%      15.000000
50%      42.000000
75%      86.000000
max      96.000000
Name: value, dtype: float64

Explanation: .melt of DataFrame when used without arguments result in two-column DataFrame with variables and values. First is name of column from which given value is taken, second is just that value. For df from my example print(df.melt()) gives:

  variable  value
0       X1     14
1       X1     42
2       X1     40
3       X2     52
4       X2     15
5       X2     96
6       X3      2
7       X3     86
8       X3     90

Upvotes: 1

Related Questions