Reputation: 17
I have a dataframe, 11 columns 18k rows. The last column is either a 1 or 0, but when I use .describe() all I get is
count 19020
unique 2
top 1
freq 12332
Name: Class, dtype: int64
as opposed to an actual statistical analysis with mean, std, etc.
Is there a way to do this?
Upvotes: 0
Views: 597
Reputation: 3171
You could use
# percentile list
perc =[.20, .40, .60, .80]
# list of dtypes to include
include =['object', 'float', 'int']
data.describe(percentiles = perc, include = include)
where data
is your dataframe (important point).
Since you are new to stack, I might suggest that you include some actual code (i.e. something showing how and on what you are using your methods). You'll get better answers
Upvotes: 1
Reputation: 2335
If your numeric (0, 1) column is not being picked up automatically by .describe()
, it might be because it's not actually encoded as an int
dtype. You can see this in the documentation of the .describe()
method, which tells you that the default include
parameter is only for numeric types:
None (default) : The result will include all numeric columns.
My suggestion would be the following:
df.dtypes # check datatypes
df['num'] = df['num'].astype(int) # if it's not integer, cast it as such
df.describe(include=['object', 'int64']) # explicitly state the data types you'd like to describe
That is, first check the datatypes (I'm assuming the column is called num
and the dataframe df
, but feel free to substitute with the right ones). If this indicator/(0,1) column is indeed not encoded as int
/integer type, then cast it as such by using .astype(int)
. Then, you can freely use df.describe()
and perhaps even specify columns of which data types you want to include in the description output, for more fine-grained control.
Upvotes: 1