Reputation: 20342

Select only categorical columns from a dataframe?

I know how to select numeric fields from one dataframe to another.

df1 = df.select_dtypes(include=[np.number])

I was thinking there should be a similar way to select categorical fields, as such.

df2 = df.select_dtypes(include=['category'])

Of course that doesn't work. Is there a way to do this? I have a data frame with float64 and object datatypes.

Also, I'm trying to split these into continuous and discrete types, and hopefully bin the continuous data points. The line below seems to work fine.

df1['price_bins'] = pd.cut(df1.PRICE, bins=15)

Is this the preferred way to do this kind of thing?

Upvotes: 1

Answers (2)

Justin Mathew

Reputation: 11

to find out categorical values just replace the word 'categorical' in your code to 'object'

df2 = df.select_dtypes(include=['object'])

Upvotes: 1

call-in-co

Reputation: 291

I have version 0.23.4 of pandas installed, and this is the help documentation from select_dtypes():

Notes

To select all numeric types, use np.number or 'number'

To select strings you must use the object dtype, but note that this will return all object dtype columns

See the numpy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>__

To select datetimes, use np.datetime64, 'datetime' or 'datetime64'

To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'

To select Pandas categorical dtypes, use 'category'

To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or 'datetime64[ns, tz]'

The second-to-last is the one you're looking for: use just 'category' instead of ['category']; i.e. don't wrap in square brackets.

Upvotes: 0

Select only categorical columns from a dataframe?

Answers (2)

Notes

Related Questions