Reputation: 12200
I have a DataFrame
with one category
column.
I add a new column and want to have it the same category
dtype.
This is the inital data
A B
0 A 0
1 B 1
2 C 2
I add new category column and hopefully copy the dtype
of column A
for it.
df = pd.DataFrame(data)
df.A = df.A.astype('category')
Looks OK the first time.
print(df.C)
0 NaN
1 NaN
2 NaN
Name: C, dtype: category
Categories (3, object): ['A', 'B', 'C']
But when I add values to it...
df.C = 'A'
print(df.C)
0 A
1 A
2 A
Name: C, dtype: object
This is the full MWE.
#!/usr/bin/env python3
import pandas as pd
data = {'A': ['A', 'B', 'C'],
'B': range(3)}
df = pd.DataFrame(data)
df.A = df.A.astype('category')
print(df)
# New empty(!) column
df['C'] = pd.NA
df.C = df.C.astype(df.A.dtype)
# OK, the categories are there
print(df.C)
# set one value (from the category)
df.C = 'A'
# the category type is gone
print(df.C)
By the way: In the real data I copy the dtype between two columns of two different DataFrames. But I do not think this matter for this question.
Upvotes: 1
Views: 123
Reputation: 1654
If you use one of these options...
# set C first option
df.C = pd.Series(['A'] * len(df.C)).astype(df.A.dtype)
# set C second option
df.C = df.C.fillna("A")
# set C third option, probably most intuitive
df.C[:] = "A"
all solutions give the following output for print(df.C)
:
0 A
1 A
2 A
Name: C, dtype: category
Categories (3, object): ['A', 'B', 'C']
Upvotes: 1