Reputation: 11813
I read my text file into pandas dataframe. All columns are object datatype. What I need to do is convert all those columns that appears 'numeric' to numeric columns. If there are ust a few columns, it's very easy. But my real dataframe has over two hundred columns. I wonder if there is anyway to convert those columns to numeric while keep those which are unable to be converted intact. For example, I have the dataframe below.
df = pd.DataFrame({'a': ['1', '2', 'NA', '4'],
'b': ['a', 'b', 'c', 'd'],
'c': ['aa', 'bb', 'cc', 'dd'],
'd': ['11', '22', '33', '44']})
df[['a', 'b', 'c', 'd']] = df[['a', 'b', 'c', 'd']].astype(int)
I got error. How can I convert column a and d to numeric while keep b and c as object? Again, my real dataframe has many columns, this is just an example to illustrate my point. I don't want to do allthe hardcoding to convert dtype for every columns.Thanks a lot.
Upvotes: 4
Views: 3475
Reputation: 323326
Op1. I usually using to_numeric
then fillna
(The reason : I usually have some mixed dtype within one column )
df=df[['a', 'b', 'c', 'd']].apply(pd.to_numeric,errors='coerce').fillna(df)
df.dtypes
Out[605]:
a int64
b object
c object
d int64
dtype: object
Op2. Or you can using to_numeric
+ignore
df[['a', 'b', 'c', 'd']].apply(pd.to_numeric,errors='ignore').dtypes
Out[608]:
a int64
b object
c object
d int64
dtype: object
Update
df[['a', 'b', 'c', 'd']].apply(pd.to_numeric,errors='coerce').fillna(df).applymap(type)
Out[652]:
a b c d
0 <class 'float'> <class 'str'> <class 'str'> <class 'int'>
1 <class 'float'> <class 'str'> <class 'str'> <class 'int'>
2 <class 'str'> <class 'str'> <class 'str'> <class 'int'>
3 <class 'float'> <class 'str'> <class 'str'> <class 'int'>
If you want , you can add df = df.replace('NA',np.nan)
before running the 1st
Update 2
s=df.apply(pd.to_numeric,errors='coerce').dropna(axis=1,thresh=1)
pd.concat([s,df.loc[:,~df.columns.isin(s.columns)]],1).dtypes
Out[668]:
a float64
d int64
b object
c object
dtype: object
Upvotes: 4