Reputation: 2236
I'm creating an empty dataframe with a given shape and want to fill it with some values from another given dataframe that contains categorical columns, is it possible to share the same categorical dtype for the column in the new dataframe if it will have only a subset of original one's unique values?
Upvotes: 0
Views: 1158
Reputation: 210812
I think you can use Series.cat.remove_unused_categories() method.
Here is a small demo:
In [311]: df
Out[311]:
channel day month t1 title year
631 AAA 06 01 1388967300000 title 1 2014
632 CBR 06 01 1388973300000 title 2 2014
633 CBR 06 01 1388974500000 title 3 2014
In [312]: df.channel
Out[312]:
631 AAA
632 CBR
633 CBR
Name: channel, dtype: category
Categories (2, object): [AAA, CBR]
In [313]: cp = df[df.channel == 'CBR'].copy()
In [314]: cp.channel
Out[314]:
632 CBR
633 CBR
Name: channel, dtype: category
Categories (2, object): [AAA, CBR]
In [315]: cp.channel.cat.categories
Out[315]: Index(['AAA', 'CBR'], dtype='object')
In [316]: cp.channel.cat.remove_unused_categories(inplace=True)
In [317]: cp.channel.cat.categories
Out[317]: Index(['CBR'], dtype='object')
In [318]: cp.channel
Out[318]:
632 CBR
633 CBR
Name: channel, dtype: category
Categories (1, object): [CBR]
UPDATE:
In [328]: new = pd.DataFrame({'x':[1,2]})
In [329]: new['ch'] = df.loc[df.channel == 'CBR', 'channel'].values
In [330]: new
Out[330]:
x ch
0 1 CBR
1 2 CBR
In [331]: new.dtypes
Out[331]:
x int64
ch category
dtype: object
In [332]: new.ch.cat.categories
Out[332]: Index(['AAA', 'CBR'], dtype='object')
Upvotes: 1