Reputation: 511
I am trying to do a naive Bayes and after loading some data into a dataframe in Pandas, the describe function captures the data I want. I'd like to capture the mean and std from each column of the table but am unsure on how to do that. I've tried things like:
df.describe([mean])
df.describe(['mean'])
df.describe().mean
None are working. I was able to do something similar in R with summary but don't know how to do in Python. Can someone lend some advice?
Upvotes: 16
Views: 25395
Reputation: 2046
If you want the mean
or the std
of a column of your dataframe, you don't need to go through describe()
. Instead, the proper way would be to just call the respective statistical function on the column (which really is a pandas.core.series.Series
). Here is an example:
import pandas as pd
# crate dataframe with some numerical data
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'B': [8, 7, 6, 5, 4, 3, 2, 1, 0, 0]})
print(df['A'].mean()) # 5.5
print(df['B'].std()) # 2.8751811537130436
See here for the descriptive stats that are built into the pandas Series
.
(Let me know if I am misunderstanding what you are trying to do here.)
Upvotes: 0
Reputation: 1
yeah bro i am faced same problem after seeing these solutions i tried it.luckly one get worked.here i worked on the 75% in describe function this is my coded=bank.groupby(by=['region','Gender']).get_group(('south Moravia','Female')) d.cashwdn.describe()['75%']
Upvotes: -1
Reputation: 1
You can try:
import numpy as np
import pandas as pd
data = pd.read_csv('./FileName.csv')
data.describe().loc['mean']
Upvotes: 0
Reputation: 71
If you further want to extract specific column data then try:
df.describe()['FeatureName']['mean']
Replace mean with any other statistic you want to extract
Upvotes: 7
Reputation: 39072
You were close. You don't need any include
tag. Just rewrite your second approach correctly: df.describe()['mean']
For example:
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5])
s.describe()['mean']
# 3.0
If you want both mean
and std
, just write df.describe()[['mean', 'std']]
. For example,
s.describe()[['mean', 'std']]
# mean 3.000000
# std 1.581139
# dtype: float64
Upvotes: 10
Reputation: 3930
Please try something like this:
df.describe(include='all').loc['mean']
Upvotes: 16