Joker312
Joker312

Reputation: 59

Pandas to_numeric

I am trying to use pandas.to_numeric() in order to convert the value of a column in my DataFrame to integers. The DataFrame is as follows:

QuestionID Value
0 Q1 150.0
1 Q2 160.0
2 Q3 NaN
3 Q4 210.0
4 Q5 Hello

How could I possibly convert the values to integers if I have NaN and Hello among the values using pandas.to_numeric() while also dropping the rows that cannot be converted?

My expected dataframe is as follows:

QuestionID Value
0 Q1 150
1 Q2 160
3 Q4 210

Upvotes: 1

Views: 2770

Answers (2)

hyit
hyit

Reputation: 721

df = pd.DataFrame([["Q1", "150"], ["Q2", "160"], ["Q3", "NaN"],
                   ["Q4", "210"], ["Q5", "Hello"]], columns=["QuestionID", "Value"])
df

  QuestionID  Value
0         Q1    150
1         Q2    160
2         Q3    NaN
3         Q4    210
4         Q5  Hello

Since you'd like to drop all invalid rows, I'd perhaps consider using the pd.Series.str.isnumeric() as an indexer:

df = df[df["Value"].str.isnumeric()]  # Keep rows with numeric values in "Value"
df.loc[:, "Value"] = df["Value"].astype(int)  # Cast to integers

Alternatively, building on @Chris suggestion, you can also add the integer type-casting after the df.assign call:

df.assign(Value=pd.to_numeric(df["Value"], errors='coerce')).dropna().astype({"Value": int})

Upvotes: 0

Chris
Chris

Reputation: 16147

'coerce' will return NaN for any non numeric value, which you can then drop those records with dropna.

df.assign(Value=pd.to_numeric(df.Value, errors='coerce')).dropna()

Upvotes: 3

Related Questions