Max Gebhard
Max Gebhard

Reputation: 17

Not getting stats analysis of binary column pandas

I have a dataframe, 11 columns 18k rows. The last column is either a 1 or 0, but when I use .describe() all I get is

count     19020
unique        2
top           1
freq      12332
Name: Class, dtype: int64

as opposed to an actual statistical analysis with mean, std, etc.

Is there a way to do this?

Upvotes: 0

Views: 597

Answers (2)

zar3bski
zar3bski

Reputation: 3171

You could use

# percentile list 
perc =[.20, .40, .60, .80] 
  
# list of dtypes to include 
include =['object', 'float', 'int']

data.describe(percentiles = perc, include = include) 

where data is your dataframe (important point).

Since you are new to stack, I might suggest that you include some actual code (i.e. something showing how and on what you are using your methods). You'll get better answers

Upvotes: 1

tania
tania

Reputation: 2335

If your numeric (0, 1) column is not being picked up automatically by .describe(), it might be because it's not actually encoded as an int dtype. You can see this in the documentation of the .describe() method, which tells you that the default include parameter is only for numeric types:

None (default) : The result will include all numeric columns.

My suggestion would be the following:


df.dtypes # check datatypes
df['num'] = df['num'].astype(int) # if it's not integer, cast it as such

df.describe(include=['object', 'int64']) # explicitly state the data types you'd like to describe

That is, first check the datatypes (I'm assuming the column is called num and the dataframe df, but feel free to substitute with the right ones). If this indicator/(0,1) column is indeed not encoded as int/integer type, then cast it as such by using .astype(int). Then, you can freely use df.describe() and perhaps even specify columns of which data types you want to include in the description output, for more fine-grained control.

Upvotes: 1

Related Questions