zesla
zesla

Reputation: 11813

how to convert columns to numeric while keep those failed intact in pandas

I read my text file into pandas dataframe. All columns are object datatype. What I need to do is convert all those columns that appears 'numeric' to numeric columns. If there are ust a few columns, it's very easy. But my real dataframe has over two hundred columns. I wonder if there is anyway to convert those columns to numeric while keep those which are unable to be converted intact. For example, I have the dataframe below.

df = pd.DataFrame({'a': ['1', '2', 'NA', '4'],
              'b': ['a', 'b', 'c', 'd'],
              'c': ['aa', 'bb', 'cc', 'dd'],
              'd': ['11', '22', '33', '44']})
df[['a', 'b', 'c', 'd']] = df[['a', 'b', 'c', 'd']].astype(int)

I got error. How can I convert column a and d to numeric while keep b and c as object? Again, my real dataframe has many columns, this is just an example to illustrate my point. I don't want to do allthe hardcoding to convert dtype for every columns.Thanks a lot.

Upvotes: 4

Views: 3475

Answers (2)

BENY
BENY

Reputation: 323326

Op1. I usually using to_numeric then fillna (The reason : I usually have some mixed dtype within one column )

df=df[['a', 'b', 'c', 'd']].apply(pd.to_numeric,errors='coerce').fillna(df)
df.dtypes
Out[605]: 
a     int64
b    object
c    object
d     int64
dtype: object

Op2. Or you can using to_numeric+ignore

df[['a', 'b', 'c', 'd']].apply(pd.to_numeric,errors='ignore').dtypes
Out[608]: 
a     int64
b    object
c    object
d     int64
dtype: object

Update

df[['a', 'b', 'c', 'd']].apply(pd.to_numeric,errors='coerce').fillna(df).applymap(type)
Out[652]: 
                 a              b              c              d
0  <class 'float'>  <class 'str'>  <class 'str'>  <class 'int'>
1  <class 'float'>  <class 'str'>  <class 'str'>  <class 'int'>
2    <class 'str'>  <class 'str'>  <class 'str'>  <class 'int'>
3  <class 'float'>  <class 'str'>  <class 'str'>  <class 'int'>

If you want , you can add df = df.replace('NA',np.nan) before running the 1st

Update 2

s=df.apply(pd.to_numeric,errors='coerce').dropna(axis=1,thresh=1)

pd.concat([s,df.loc[:,~df.columns.isin(s.columns)]],1).dtypes
Out[668]: 
a    float64
d      int64
b     object
c     object
dtype: object

Upvotes: 4

Shenglin Chen
Shenglin Chen

Reputation: 4554

df[['a','d']]=df[['a','d']].apply(lambda x: x.astype(int))

Upvotes: 0

Related Questions