ASH
ASH

Reputation: 20342

Select only categorical columns from a dataframe?

I know how to select numeric fields from one dataframe to another.

df1 = df.select_dtypes(include=[np.number])

I was thinking there should be a similar way to select categorical fields, as such.

df2 = df.select_dtypes(include=['category'])

Of course that doesn't work. Is there a way to do this? I have a data frame with float64 and object datatypes.

Also, I'm trying to split these into continuous and discrete types, and hopefully bin the continuous data points. The line below seems to work fine.

df1['price_bins'] = pd.cut(df1.PRICE, bins=15)

Is this the preferred way to do this kind of thing?

Upvotes: 1

Views: 9549

Answers (2)

Justin Mathew
Justin Mathew

Reputation: 11

to find out categorical values just replace the word 'categorical' in your code to 'object'

df2 = df.select_dtypes(include=['object'])

Upvotes: 1

call-in-co
call-in-co

Reputation: 291

I have version 0.23.4 of pandas installed, and this is the help documentation from select_dtypes():

Notes

  • To select all numeric types, use np.number or 'number'
  • To select strings you must use the object dtype, but note that this will return all object dtype columns
  • See the numpy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>__
  • To select datetimes, use np.datetime64, 'datetime' or 'datetime64'
  • To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'
  • To select Pandas categorical dtypes, use 'category'
  • To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or 'datetime64[ns, tz]'

The second-to-last is the one you're looking for: use just 'category' instead of ['category']; i.e. don't wrap in square brackets.

Upvotes: 0

Related Questions