How to calculate statistical values on Pandas dataframe?

Question

I have a Pandas dataframe and I want to use the Describe() method to calculate statistical values like average, standard deviation and others, for each column.

An example showing the frame's structure:

X Axis (float array)        Y Axis (complex array)   Val (float)        Class
0   [0, 1, 2...]            [0, 1+1j, 2+2j,...]       1                 'd'
1   [0, 1, 2...]            [0, 1+1j, 2+2j,...]       2                 'n'
....

I've called the Describe() method, but it shows only the count, unique, top and frequency values.

Does anybody know if it's possible to use the Describe() method to calculate statistical values like average, standard deviation etc. for each column (in case of the complex array, the absolute values will be used for calculations)?

Or can someone suggest how to proceed to calculate these values?

Aryan Jain · Accepted Answer

You can use scipy.stats.describe(df) to do this. This will return all the stats you listed above and more. The value for each key in result will be array and each element of that array will correspond to the stat for the corresponding column number in your dataframe. Example -

In [1]: from scipy import stats

In [2]: import pandas as pd

In [3]: df = pd.DataFrame([[1,2,3],[2,3,4],[3,4,5]], columns=list('abc'))

In [4]: stats.describe(df)
Out[4]: DescribeResult(nobs=3, minmax=(array([1, 2, 3]), array([3, 4, 5])), mean=array([2., 3., 4.]), variance=array([1., 1., 1.]), skewness=array([0., 0., 0.]), kurtosis=array([-1.5, -1.5, -1.5]))

How to calculate statistical values on Pandas dataframe?

Answers (1)

Related Questions