Cmf55
Cmf55

Reputation: 61

Pandas Dataframe object type

I have a large dataframe, ~ 1 million rows and 9 columns with some rows missing data in a few of the columns.

dat = pd.read_table( 'file path', delimiter = ';')

I        z        Sp   S        B        B/T     r        gf      k
0        0.0303   2    0.606    0.31     0.04    0.23     0.03    0.38   
1        0.0779   2             0.00     0.00    0.05     0.01    0.00

The first few columns are being read in as a string, and the last few as NaN, even when there is a numeric value there. When I include dtype = 'float64' I get:

ValueError: could not convert string to float: 

Any help in fixing this?

Upvotes: 2

Views: 1729

Answers (1)

jezrael
jezrael

Reputation: 863166

You can use replace by regex - one or more whitespaces to NaN, then cast to float

Empty strings in data are converted to NaN in read_table.

df = df.replace({'\s+':np.nan}, regex=True).astype(float)
print (df)
     I       z   Sp      S     B   B/T     r    gf     k
0  0.0  0.0303  2.0  0.606  0.31  0.04  0.23  0.03  0.38
1  1.0  0.0779  2.0    NaN  0.00  0.00  0.05  0.01  0.00

If data contains some strings which need be replaced to NaN is possible use to_numeric with apply:

df = df.apply(lambda x: pd.to_numeric(x, errors='coerce'))
print (df)
   I       z  Sp      S     B   B/T     r    gf     k
0  0  0.0303   2  0.606  0.31  0.04  0.23  0.03  0.38
1  1  0.0779   2    NaN  0.00  0.00  0.05  0.01  0.00

Upvotes: 2

Related Questions