Alexander Reshytko
Alexander Reshytko

Reputation: 2236

get pandas categorical column categories and assign them as a dtype to another column

I'm creating an empty dataframe with a given shape and want to fill it with some values from another given dataframe that contains categorical columns, is it possible to share the same categorical dtype for the column in the new dataframe if it will have only a subset of original one's unique values?

Upvotes: 0

Views: 1158

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210812

I think you can use Series.cat.remove_unused_categories() method.

Here is a small demo:

In [311]: df
Out[311]:
    channel day month             t1    title  year
631     AAA  06    01  1388967300000  title 1  2014
632     CBR  06    01  1388973300000  title 2  2014
633     CBR  06    01  1388974500000  title 3  2014

In [312]: df.channel
Out[312]:
631    AAA
632    CBR
633    CBR
Name: channel, dtype: category
Categories (2, object): [AAA, CBR]

In [313]: cp = df[df.channel == 'CBR'].copy()

In [314]: cp.channel
Out[314]:
632    CBR
633    CBR
Name: channel, dtype: category
Categories (2, object): [AAA, CBR]

In [315]: cp.channel.cat.categories
Out[315]: Index(['AAA', 'CBR'], dtype='object')

In [316]: cp.channel.cat.remove_unused_categories(inplace=True)

In [317]: cp.channel.cat.categories
Out[317]: Index(['CBR'], dtype='object')

In [318]: cp.channel
Out[318]:
632    CBR
633    CBR
Name: channel, dtype: category
Categories (1, object): [CBR]

UPDATE:

In [328]: new = pd.DataFrame({'x':[1,2]})

In [329]: new['ch'] = df.loc[df.channel == 'CBR', 'channel'].values

In [330]: new
Out[330]:
   x   ch
0  1  CBR
1  2  CBR

In [331]: new.dtypes
Out[331]:
x        int64
ch    category
dtype: object

In [332]: new.ch.cat.categories
Out[332]: Index(['AAA', 'CBR'], dtype='object')

Upvotes: 1

Related Questions