Amelio Vazquez-Reina
Amelio Vazquez-Reina

Reputation: 96284

Converting multiple columns to categories in Pandas. apply?

Consider a Dataframe. I want to convert a set of columns to_convert to categories.

I can certainly do the following:

for col in to_convert:
  df[col] = df[col].astype('category')

but I was surprised that the following does not return a dataframe:

df[to_convert].apply(lambda x: x.astype('category'), axis=0)

which of course makes the following not work:

df[to_convert] = df[to_convert].apply(lambda x: x.astype('category'), axis=0)

Why does apply (axis=0) return a Series even though it is supposed to act on the columns one by one?

Upvotes: 11

Views: 13519

Answers (2)

joelostblom
joelostblom

Reputation: 48919

Note that since pandas 0.23.0 you no longer apply to convert multiple columns to categorical data types. Now you can simply do df[to_convert].astype('category') instead (where to_convert is a set of columns as defined in the question).

Upvotes: 4

Jeff
Jeff

Reputation: 128958

This was just fixed in master, and so will be in 0.17.0, see the issue here

In [7]: df = DataFrame({'A' : list('aabbcd'), 'B' : list('ffghhe')})

In [8]: df
Out[8]: 
   A  B
0  a  f
1  a  f
2  b  g
3  b  h
4  c  h
5  d  e

In [9]: df.dtypes
Out[9]: 
A    object
B    object
dtype: object

In [10]: df.apply(lambda x: x.astype('category'))       
Out[10]: 
   A  B
0  a  f
1  a  f
2  b  g
3  b  h
4  c  h
5  d  e

In [11]: df.apply(lambda x: x.astype('category')).dtypes
Out[11]: 
A    category
B    category
dtype: object

Upvotes: 10

Related Questions