Reputation: 1915
Is there a more advanced function like the describe that the pandas has? Normally i will go on like :
r = pd.DataFrame(np.random.randn(1000), columns = ['A'])
r.describe()
and i will get a nice summary.Like this one:
A
count 1000.000000
mean 0.010230
std 0.982562
min -2.775969
25% -0.664840
50% 0.015452
75% 0.694440
max 3.101434
Can i find something a little more elaborate in statsmodels or scipy maybe?
Upvotes: 10
Views: 11867
Reputation: 31
UPDATE: ".append" method has been deprecated in Pandas. To use the same function with as little disruption as possible the "._append" method should be used.
HERE IS THE UPDATED CODE:
import pandas as pd
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
return stats._append([kurtosis_df,skewness_df])
EVERYTHING IS THE SAME EXCEPT FOR THE UNDERSCORE "_" PRECEDING THE "append" KEYWORD: "._append".
".append" vs "._append"
REFERENCE: DataFrame object has no attribute append
Upvotes: 1
Reputation: 31
Found this excellent solution after much searching. It is simple and extends the existing describe() method. It adds two rows to the describe() method output, one for kurtosis and one for skew, by creating a new function describex().
Custom function to add skewness and kurtosis in descriptive stats to a pandas dataframe:
import pandas as pd
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
return stats.append([kurtosis_df,skewness_df])
It is similar to the previous answer, but creates a callable function.
source: https://gist.github.com/chkoar/5cb11b22b6733cbd408912b518e43a94
Upvotes: 2
Reputation: 91
from ydata_profiling import ProfileReport
eda = ProfileReport(df)
display(eda)
Pandas profiling is a very powerful tool which gives you almost complete EDA of your dataset starting from missing values, correlations, heat-maps and what not!
Upvotes: 9
Reputation: 91
I'd rather bound to leverage the pandas library (add variance, skewness, kurtosis) than use 'external' ones, say:
stats = df.describe()
stats.loc['var'] = df.var().tolist()
stats.loc['skew'] = df.skew().tolist()
stats.loc['kurt'] = df.kurtosis().tolist()
print(stats)
PD: pandas_profiling is amazing though
Yerart
Upvotes: 9
Reputation: 16997
from scipy.stats import describe
describe(r, axis=0)
It will give you the size, (min,max), mean, variance, skewness, and kurtosis
Upvotes: 13