Dataframe: How to remove dot in a string

Question

I want to use categorical features directly with CatBoost model and I need to declare my object columns as categorical in Catboost model . I have a column in my data frame which is an object containing nace codes looking like this:

NACE_code

5632      81.101
8060      41.200
15147     43.120
24644     68.100
29144     86.909
37122         68
39853         43
59268         43
108633    70.220
108693    56.102
175820    43.320
184606    41.200
Name: NACE_code, dtype: object

Python doesn't accept this column as categorical column. Instead it tells me that this is a float since some of the values have dots. I am relatively new in python and I have tried different ways to remove the dot from those values but my last attempt changes all those values without dot to NAN.

df['NACE_code'].str.replace(r"(\d)\.", r"\1")

5632      81101
8060      41200
15147     43120
24644     68100
29144     86909
37122       NaN
39853       NaN
59268       NaN
108633    70220
108693    56102
175820    43320
184606    41200
Name: NACE_KODE, dtype: object

How do I get my column to look like this? I appreciate any help I can get!

5632      81101
8060      41200
15147     43120
24644     68100
29144     86909
37122       68
39853       43
59268       43
108633    70220
108693    56102
175820    43320
184606    41200

Aakash Dusane · Accepted Answer

# The following code should work:
df.NACE_code = df.NACE_code.astype(str)
df.NACE_code = df.NACE_code.str.replace('.', '')

Dataframe: How to remove dot in a string

Answers (2)

Related Questions