Reputation: 205
I have a Pandas dataframe and I want to use the Describe() method to calculate statistical values like average, standard deviation and others, for each column.
An example showing the frame's structure:
X Axis (float array) Y Axis (complex array) Val (float) Class
0 [0, 1, 2...] [0, 1+1j, 2+2j,...] 1 'd'
1 [0, 1, 2...] [0, 1+1j, 2+2j,...] 2 'n'
....
I've called the Describe() method, but it shows only the count, unique, top and frequency values.
Does anybody know if it's possible to use the Describe() method to calculate statistical values like average, standard deviation etc. for each column (in case of the complex array, the absolute values will be used for calculations)?
Or can someone suggest how to proceed to calculate these values?
Upvotes: 0
Views: 472
Reputation: 384
You can use scipy.stats.describe(df)
to do this.
This will return all the stats you listed above and more. The value for each key in result will be array and each element of that array will correspond to the stat for the corresponding column number in your dataframe.
Example -
In [1]: from scipy import stats
In [2]: import pandas as pd
In [3]: df = pd.DataFrame([[1,2,3],[2,3,4],[3,4,5]], columns=list('abc'))
In [4]: stats.describe(df)
Out[4]: DescribeResult(nobs=3, minmax=(array([1, 2, 3]), array([3, 4, 5])), mean=array([2., 3., 4.]), variance=array([1., 1., 1.]), skewness=array([0., 0., 0.]), kurtosis=array([-1.5, -1.5, -1.5]))
Upvotes: 1