Reputation: 536
Assume a pandas df with many columns. I am trying to convert all non-numeric values into np.nan values using pd.to_numeric as specified below. However, I do not want to apply this to the first two columns; rather, it would only be applied to all columns other than the first two.
For instance, assume the following:
import pandas as pd
import numpy as np
df = pd.DataFrame({'name': ['Adam', 'Bob', 'Chuck', 'David'],
'color': ['blue', 'green', 'red', 'yellow']
'number1': [50, 750, 'ad098', 'baseball'],
'number2': [25, 'text', 1000, '200']},
)
Generally, I would just call out the names of the two columns that should be excluded. However, in this case, I am trying to create a framework that can be applied to any df regardless of the names of the columns. Hence, I want to exclude the first two columns on the basis of their column numbers [0:1].
I am able to successfully convert all non-numeric values in all columns to np.nam using the following:
df = df.apply(pd.to_numeric, errors='coerce')
However, when I try to exclude the first two columns using either of the two methods below, I get an error.
df = df[df.columns[2:].apply(pd.to_numeric, errors='coerce')]
gives the error: "AttributeError: 'Index' object has no attribute 'apply'"
df = df[df.iloc[:,2:].apply(pd.to_numeric, errors='coerce')]
gives the error: "ValueError: Boolean array expected for the condition, not object"
Clearly I am doing something wrong, but I can't figure out what it is. Any help would be greatly appreciated. Thank you.
Upvotes: 0
Views: 1001
Reputation: 7509
Try with:
df.iloc[:, 2:] = df.iloc[:, 2:].apply(pd.to_numeric, errors='coerce')
This reads as "replace the columns after the first two with those same columns after applying method X".
Writing df[something]
is simply selecting the columns of df
using the object something
- a sequence of indices or column names, for example.
So when you write an expression like
df[df.iloc[:,2:].apply(pd.to_numeric, errors='coerce')]
your something
is a DataFrame (the value returned from the expression df.iloc[:,2:].apply(pd.to_numeric, errors='coerce')
).
Effectively, you were confusing the values used to select columns with the values you wanted to replace those columns with.
Upvotes: 1