buhtz
buhtz

Reputation: 12200

Set a pandas column dtype to the same as an existing category column

I have a DataFrame with one category column. I add a new column and want to have it the same category dtype.

This is the inital data

   A  B
0  A  0
1  B  1
2  C  2

I add new category column and hopefully copy the dtype of column A for it.

df = pd.DataFrame(data)
df.A = df.A.astype('category')

Looks OK the first time.

print(df.C)

0    NaN
1    NaN
2    NaN
Name: C, dtype: category
Categories (3, object): ['A', 'B', 'C']

But when I add values to it...

df.C = 'A'
print(df.C)

0    A
1    A
2    A
Name: C, dtype: object

This is the full MWE.

#!/usr/bin/env python3
import pandas as pd

data = {'A': ['A', 'B', 'C'],
        'B': range(3)}

df = pd.DataFrame(data)
df.A = df.A.astype('category')

print(df)

# New empty(!) column
df['C'] = pd.NA
df.C = df.C.astype(df.A.dtype)

# OK, the categories are there
print(df.C)

# set one value (from the category)
df.C = 'A'

# the category type is gone
print(df.C)

By the way: In the real data I copy the dtype between two columns of two different DataFrames. But I do not think this matter for this question.

Upvotes: 1

Views: 123

Answers (1)

Albo
Albo

Reputation: 1654

If you use one of these options...

# set C first option
df.C = pd.Series(['A'] * len(df.C)).astype(df.A.dtype)

# set C second option
df.C = df.C.fillna("A")

# set C third option, probably most intuitive
df.C[:] = "A"

all solutions give the following output for print(df.C):

0    A
1    A
2    A
Name: C, dtype: category
Categories (3, object): ['A', 'B', 'C']

Upvotes: 1

Related Questions