Reputation: 1810
I'm importing a dataframe from a text file
I'd like to specify the data type of the columns, but pandas seems to ignore the dtype
input.
A working example:
from io import StringIO
import pandas as pd
string = 'USAF WBAN STATION NAME CTRY ST CALL LAT LON ELEV(M) BEGIN END\n007026 99999 WXPOD 7026 AF +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070 AF +00.000 +000.000 +7070.0 20140923 20150926'
f = StringIO(string)
df = pd.read_fwf(f,
colspecs = [(0,6),
(7,12),
(13,41),
(43,45),
(48,50),
(51,55),
(57,64),
(65,73),
(74,81),
(82,90),
(91,101)],
dtypes = {'USAF' : str,
'WBAN' : str,
'STATION NAME' : str,
'CT' : str,
'ST' : str,
'CALL' : str,
'LAT' : float,
'LON' : float,
'ELEV(M)' : float,
'BEGIN' : int,
'END' : int,},
)
df.dtype
returns
USAF int64
WBAN int64
STATION NAME object
CT object
ST float64
CALL float64
LAT float64
LON float64
ELEV(M) float64
BEGIN int64
END int64
dtype: object
Why does this happen? How can I force the first columns to be strings?
Upvotes: 3
Views: 1930
Reputation: 5482
There are issues with dtype conversion with read_fwf. This is Pandas guessing the type and applying. Use converters
here explicitly. You have to do this during DataFrame creation as you will lose leading 0
s if you convert afterwards.
string = 'USAF WBAN STATION NAME CTRY ST CALL LAT LON ELEV(M) BEGIN END\n007026 99999 WXPOD 7026 AF +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070 AF +00.000 +000.000 +7070.0 20140923 20150926'
f = StringIO(string)
df = pd.read_fwf(f,
colspecs = [(0,6),
(7,12),
(13,41),
(43,45),
(48,50),
(51,55),
(57,64),
(65,73),
(74,81),
(82,90),
(91,101)],
converters = {'USAF':lambda x : str(x),
'WBAN':lambda x : str(x),
'STATION NAME':lambda x : str(x),
'CT':lambda x : str(x),
'ST':lambda x : str(x),
'CALL':lambda x : str(x)}
)
>>> df.dtypes
USAF object
WBAN object
STATION NAME object
CT object
ST object
CALL object
LAT float64
LON float64
ELEV(M) float64
BEGIN int64
END int64
dtype: object
Upvotes: 4