Luca
Luca

Reputation: 1810

pandas.read_fwf ignores dtypes provided

I'm importing a dataframe from a text file I'd like to specify the data type of the columns, but pandas seems to ignore the dtype input.

A working example:

from io import StringIO
import pandas as pd

string = 'USAF   WBAN  STATION NAME                  CTRY ST CALL  LAT     LON      ELEV(M) BEGIN    END\n007026 99999 WXPOD 7026                    AF            +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070                    AF            +00.000 +000.000 +7070.0 20140923 20150926'

f = StringIO(string)

df = pd.read_fwf(f,
                 colspecs = [(0,6),
                             (7,12),
                             (13,41),
                             (43,45),
                             (48,50),
                             (51,55),
                             (57,64),
                             (65,73),
                             (74,81),
                             (82,90),
                             (91,101)],
                 dtypes = {'USAF'         : str,
                           'WBAN'         : str,
                           'STATION NAME' : str,
                           'CT'           : str,
                           'ST'           : str,
                           'CALL'         : str,
                           'LAT'          : float,
                           'LON'          : float,
                           'ELEV(M)'      : float,
                           'BEGIN'        : int,
                           'END'          : int,},
                 )
df.dtype

returns

USAF              int64
WBAN              int64
STATION NAME     object
CT               object
ST              float64
CALL            float64
LAT             float64
LON             float64
ELEV(M)         float64
BEGIN             int64
END               int64
dtype: object

Why does this happen? How can I force the first columns to be strings?

Upvotes: 3

Views: 1930

Answers (1)

EoinS
EoinS

Reputation: 5482

There are issues with dtype conversion with read_fwf. This is Pandas guessing the type and applying. Use converters here explicitly. You have to do this during DataFrame creation as you will lose leading 0s if you convert afterwards.

string = 'USAF   WBAN  STATION NAME                  CTRY ST CALL  LAT     LON      ELEV(M) BEGIN    END\n007026 99999 WXPOD 7026                    AF            +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070                    AF            +00.000 +000.000 +7070.0 20140923 20150926'

f = StringIO(string)
df = pd.read_fwf(f,
                 colspecs = [(0,6),
                             (7,12),
                             (13,41),
                             (43,45),
                             (48,50),
                             (51,55),
                             (57,64),
                             (65,73),
                             (74,81),
                             (82,90),
                             (91,101)],
                converters = {'USAF':lambda x : str(x),
                              'WBAN':lambda x : str(x),
                              'STATION NAME':lambda x : str(x),
                              'CT':lambda x : str(x),
                              'ST':lambda x : str(x),
                              'CALL':lambda x : str(x)}
                 )
>>> df.dtypes
USAF             object
WBAN             object
STATION NAME     object
CT               object
ST               object
CALL             object
LAT             float64
LON             float64
ELEV(M)         float64
BEGIN             int64
END               int64
dtype: object

Upvotes: 4

Related Questions