Hackerds
Hackerds

Reputation: 1205

Cannot convert object as strings to int - Unable to parse string

I have a data frame with one column denoting range of Ages. The data type of the Age column in shown as string. I am trying to convert string values to numeric for the model to interpret the features.

enter image description here

I tried the following to convert to 'int'.

df.Age = pd.to_numeric(df.Age)

I get the following error:

ValueError: Unable to parse string "0-17" at position 0

I also tried using the 'errors = coerce' parameter but it gave me a different error:

df.Age = pd.to_numeric(df.Age, errors='coerce').astype(int)

Error:

ValueError: Cannot convert non-finite values (NA or inf) to integer

But there are no NA values in any column in my df

Upvotes: 1

Views: 4757

Answers (3)

prosti
prosti

Reputation: 46351

Why don't you split

a=df["age"].str.split("-", n=2, expand=True)
df['age_from']=a[0].to_frame()
df['age_to']=a[1].to_frame()

Here is what I got at the end!

         date    age
0  2018-04-15  12-20
1  2018-04-15   2-30
2  2018-04-18  5-46+
         date    age age_from age_to
0  2018-04-15  12-20       12     20
1  2018-04-15   2-30        2     30
2  2018-04-18  5-46+        5    46+

Upvotes: 0

somiandras
somiandras

Reputation: 227

Age seems to be a categorical variable, so you should treat it as such. pandas has a neat category dtype which converts your labels to integers under the hood:

df['Age'] = df['Age'].astype('category')

Then you can access the underlying integers usin the cat accessor method

codes = df['Age'].cat.codes # This returns integers

Also you probably want to make Age an ordered categorical variable, for which you can also find a neat recipe in the docs.

from pandas.api.types import CategoricalDtype

age_category = CategoricalDtype([...your labels in order...], ordered=True)

df['Age'] = df['Age'].astype(age_category)

Then you can acces the underlying codes in the same way and be sure that they will reflect the order you entered for your labels.

Upvotes: 1

alt440
alt440

Reputation: 51

At first glance, I would say it is because you are attempting to convert a string that has not only an int in it. Your string is "0-17", which is not an integer. If it had been "17" or "0", the conversion would have worked.

    val = int("0")
    val = int("17")

I have no idea what your to_numeric method is, so I am not sure if I am answering your question.

Upvotes: 0

Related Questions