pandas read_csv cannot use dtype does not work on the column names

Question

I have the following data

PERMNO Names,Date,Ticker Symbol,Company Name,CUSIP Header
10000,19851231,,,68391610
10000,19860331,OMFGA,OPTIMUM MANUFACTURING INC,68391610
10001,19851231,,,36720410
10001,19860131,GFGC,GREAT FALLS GAS CO,36720410
10001,19860228,GFGC,GREAT FALLS GAS CO,36720410

I have the following data

PERMNO Names,Date,Ticker Symbol,Company Name,CUSIP Header
10000,19851231,,,68391610
10000,19860331,OMFGA,OPTIMUM MANUFACTURING INC,68391610
10001,19851231,,,36720410
10001,19860131,GFGC,GREAT FALLS GAS CO,36720410
10001,19860228,GFGC,GREAT FALLS GAS CO,36720410

I am coming this command

pd.read_csv(csv_file_path, index_col=["CUSIP Header"],
            dtype = {"CUSIP Header": str}, usecols =["Date", "CUSIP Header"], 
            parse_dates=['Date'])

however, it seems like the CUSIP Headers are not parsed as str but as floats. Indeed when I tried to call

print (actual.xs("68391610"))

I got a key error.

jezrael · Accepted Answer

It is bug 9435, so remove index_col parameter and use set_index:

df = pd.read_csv(csv_file_path,
            dtype = {'CUSIP Header': str}, usecols =["Date", "CUSIP Header"], 
            parse_dates=['Date']).set_index('CUSIP Header')

print (df)
                   Date
CUSIP Header           
68391610     1985-12-31
68391610     1986-03-31
36720410     1985-12-31
36720410     1986-01-31
36720410     1986-02-28

print (df.index)
Index(['68391610', '68391610', '36720410', '36720410', '36720410'],
       dtype='object', name='CUSIP Header')

pandas read_csv cannot use dtype does not work on the column names

Answers (1)

Related Questions