Reputation: 1062
I have the following data
PERMNO Names,Date,Ticker Symbol,Company Name,CUSIP Header
10000,19851231,,,68391610
10000,19860331,OMFGA,OPTIMUM MANUFACTURING INC,68391610
10001,19851231,,,36720410
10001,19860131,GFGC,GREAT FALLS GAS CO,36720410
10001,19860228,GFGC,GREAT FALLS GAS CO,36720410
I have the following data
PERMNO Names,Date,Ticker Symbol,Company Name,CUSIP Header
10000,19851231,,,68391610
10000,19860331,OMFGA,OPTIMUM MANUFACTURING INC,68391610
10001,19851231,,,36720410
10001,19860131,GFGC,GREAT FALLS GAS CO,36720410
10001,19860228,GFGC,GREAT FALLS GAS CO,36720410
I am coming this command
pd.read_csv(csv_file_path, index_col=["CUSIP Header"],
dtype = {"CUSIP Header": str}, usecols =["Date", "CUSIP Header"],
parse_dates=['Date'])
however, it seems like the CUSIP Headers are not parsed as str but as floats. Indeed when I tried to call
print (actual.xs("68391610"))
I got a key error.
Upvotes: 1
Views: 891
Reputation: 863711
It is bug 9435, so remove index_col
parameter and use set_index
:
df = pd.read_csv(csv_file_path,
dtype = {'CUSIP Header': str}, usecols =["Date", "CUSIP Header"],
parse_dates=['Date']).set_index('CUSIP Header')
print (df)
Date
CUSIP Header
68391610 1985-12-31
68391610 1986-03-31
36720410 1985-12-31
36720410 1986-01-31
36720410 1986-02-28
print (df.index)
Index(['68391610', '68391610', '36720410', '36720410', '36720410'],
dtype='object', name='CUSIP Header')
Upvotes: 2