Reputation: 33
I want to use categorical features directly with CatBoost model and I need to declare my object columns as categorical in Catboost model . I have a column in my data frame which is an object containing nace codes looking like this:
NACE_code
5632 81.101
8060 41.200
15147 43.120
24644 68.100
29144 86.909
37122 68
39853 43
59268 43
108633 70.220
108693 56.102
175820 43.320
184606 41.200
Name: NACE_code, dtype: object
Python doesn't accept this column as categorical column. Instead it tells me that this is a float since some of the values have dots. I am relatively new in python and I have tried different ways to remove the dot from those values but my last attempt changes all those values without dot to NAN.
df['NACE_code'].str.replace(r"(\d)\.", r"\1")
5632 81101
8060 41200
15147 43120
24644 68100
29144 86909
37122 NaN
39853 NaN
59268 NaN
108633 70220
108693 56102
175820 43320
184606 41200
Name: NACE_KODE, dtype: object
How do I get my column to look like this? I appreciate any help I can get!
5632 81101
8060 41200
15147 43120
24644 68100
29144 86909
37122 68
39853 43
59268 43
108633 70220
108693 56102
175820 43320
184606 41200
Upvotes: 2
Views: 7359
Reputation: 398
# The following code should work:
df.NACE_code = df.NACE_code.astype(str)
df.NACE_code = df.NACE_code.str.replace('.', '')
Upvotes: 2
Reputation: 471
Use astype('str')
to convert columns to string type before calling str.replace.
Without regex:
df['NACE_code'].astype('str').str.replace(r".", r"", regex=False)
Upvotes: 1