Reputation: 61
I have a large dataframe, ~ 1 million rows and 9 columns with some rows missing data in a few of the columns.
dat = pd.read_table( 'file path', delimiter = ';')
I z Sp S B B/T r gf k
0 0.0303 2 0.606 0.31 0.04 0.23 0.03 0.38
1 0.0779 2 0.00 0.00 0.05 0.01 0.00
The first few columns are being read in as a string, and the last few as NaN
, even when there is a numeric value there. When I include dtype = 'float64'
I get:
ValueError: could not convert string to float:
Any help in fixing this?
Upvotes: 2
Views: 1729
Reputation: 863166
You can use replace
by regex - one or more whitespaces
to NaN
, then cast to float
Empty strings in data are converted to NaN
in read_table
.
df = df.replace({'\s+':np.nan}, regex=True).astype(float)
print (df)
I z Sp S B B/T r gf k
0 0.0 0.0303 2.0 0.606 0.31 0.04 0.23 0.03 0.38
1 1.0 0.0779 2.0 NaN 0.00 0.00 0.05 0.01 0.00
If data contains some strings which need be replaced to NaN
is possible use to_numeric
with apply
:
df = df.apply(lambda x: pd.to_numeric(x, errors='coerce'))
print (df)
I z Sp S B B/T r gf k
0 0 0.0303 2 0.606 0.31 0.04 0.23 0.03 0.38
1 1 0.0779 2 NaN 0.00 0.00 0.05 0.01 0.00
Upvotes: 2