Reputation: 1205
I have a data frame with one column denoting range of Ages. The data type of the Age column in shown as string. I am trying to convert string values to numeric for the model to interpret the features.
I tried the following to convert to 'int'.
df.Age = pd.to_numeric(df.Age)
I get the following error:
ValueError: Unable to parse string "0-17" at position 0
I also tried using the 'errors = coerce' parameter but it gave me a different error:
df.Age = pd.to_numeric(df.Age, errors='coerce').astype(int)
Error:
ValueError: Cannot convert non-finite values (NA or inf) to integer
But there are no NA values in any column in my df
Upvotes: 1
Views: 4757
Reputation: 46351
Why don't you split
a=df["age"].str.split("-", n=2, expand=True)
df['age_from']=a[0].to_frame()
df['age_to']=a[1].to_frame()
Here is what I got at the end!
date age
0 2018-04-15 12-20
1 2018-04-15 2-30
2 2018-04-18 5-46+
date age age_from age_to
0 2018-04-15 12-20 12 20
1 2018-04-15 2-30 2 30
2 2018-04-18 5-46+ 5 46+
Upvotes: 0
Reputation: 227
Age
seems to be a categorical variable, so you should treat it as such. pandas
has a neat category
dtype which converts your labels to integers under the hood:
df['Age'] = df['Age'].astype('category')
Then you can access the underlying integers usin the cat
accessor method
codes = df['Age'].cat.codes # This returns integers
Also you probably want to make Age
an ordered categorical variable, for which you can also find a neat recipe in the docs.
from pandas.api.types import CategoricalDtype
age_category = CategoricalDtype([...your labels in order...], ordered=True)
df['Age'] = df['Age'].astype(age_category)
Then you can acces the underlying codes in the same way and be sure that they will reflect the order you entered for your labels.
Upvotes: 1
Reputation: 51
At first glance, I would say it is because you are attempting to convert a string that has not only an int in it. Your string is "0-17", which is not an integer. If it had been "17" or "0", the conversion would have worked.
val = int("0")
val = int("17")
I have no idea what your to_numeric method is, so I am not sure if I am answering your question.
Upvotes: 0