Reputation: 311
I have a dataframe containing strings, as read from a sloppy csv:
id Total B C ...
0 56 974 20 739 34 482
1 29 479 10 253 16 704
2 86 961 29 837 43 593
3 52 687 22 921 28 299
4 23 794 7 646 15 600
What I want to do: convert every cell in the frame into a number. It should be ignoring whitespaces, but put NaN where the cell contains something really strange. I probably know how to do it using terribly unperformant manual looping and replacing values, but was wondering if there's a nice and clean why to do this.
Upvotes: 1
Views: 104
Reputation: 862611
You can use read_csv
with regex separator \s{2,}
- 2 or more whitespaces and parameter thousands
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""id Total B C
0 56 974 20 739 34 482
1 29 479 10 253 16 704
2 86 961 29 837 43 593
3 52 687 22 921 28 299
4 23 794 7 646 15 600 """
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep="\s{2,}", engine='python', thousands=' ')
print (df)
id Total B C
0 0 56974 20739 34482
1 1 29479 10253 16704
2 2 86961 29837 43593
3 3 52687 22921 28299
4 4 23794 7646 15600
print (df.dtypes)
id int64
Total int64
B int64
C int64
dtype: object
And then if necessary apply
function to_numeric
with parameter errors='coerce'
- it replace non numeric to NaN
:
df = df.apply(pd.to_numeric, errors='coerce')
Upvotes: 2