Reputation: 1183
I have the following column in my dataframe:
Column1 Column2 Column3 Column4
a 1 2 a
1 2 a c
b 3 c d
3 2 1 b
4 2 1 a
c 1 d a
The type of these columns is object
, I would like to convert Column1
, Column2
and Column3
to numeric type int8
while keeping Column4
as type object. To do so, I had tried to use pd.to_numeric(data.Column1)
(I was planning to do the same after with Column2
and Column3
) but I get the following error:
ValueError: Unable to parse string "a" at position 0
Which is obvious why it is happening. I was wondering if there is any way that would allow me to get rid of of that rows formed by strings in these 3 columns so after that, I would get:
Column1 Column 2 Column 3 Column 4
3 2 1 b
4 2 1 a
Is there a way to achieve so? Or is there another way that would allow me to?
Edit: I have checked the question in Remove non-numeric rows in one column with pandas, but it didn't solve my problem, as I have more columns in my dataset than just two and one of them I don't want to convert it to numeric.
Upvotes: 2
Views: 3363
Reputation: 862481
Use apply
with to_numeric
for replace non numeric to missing values, then remove NaN
s rows by dropna
and last cast to integer
s:
df = df.apply(lambda x: pd.to_numeric(x, errors='coerce')).dropna().astype(int)
print (df)
Column1 Column2 Column3
3 3 2 1
4 4 2 1
Detail:
print (df.apply(lambda x: pd.to_numeric(x,errors='coerce')))
Column1 Column2 Column3
0 NaN 1.0 2.0
1 1.0 2.0 NaN
2 NaN NaN NaN
3 3.0 2.0 1.0
4 4.0 2.0 1.0
5 NaN 1.0 NaN
EDIT:
Another solution is check if not missing values with DataFrame.all
with boolean indexing
:
cols = ['Column1','Column2','Column3']
#define columns for check numeric
mask = df[cols].apply(lambda x: pd.to_numeric(x, errors='coerce')).notnull().all(axis=1)
#filtering
df = df[mask]
#converting to integers
df[cols] = df[cols].astype(int)
print (df)
Column1 Column2 Column3 Column4
3 3 2 1 b
4 4 2 1 a
print (df.dtypes)
Column1 int32
Column2 int32
Column3 int32
Column4 object
dtype: object
Upvotes: 3