Reputation: 623
I am trying to load a csv file with OHLC data in the following format.
In [49]: !head '500008.csv'
03 Jan 2000,12.85,13.11,12.74,13.11,976500,,,,
04 Jan 2000,13.54,13.60,12.56,13.33,2493000,,,,
05 Jan 2000,12.68,13.34,12.37,12.68,1680000,,,,
06 Jan 2000,12.60,13.30,12.27,12.34,2800500,,,,
07 Jan 2000,12.53,12.70,11.82,12.57,2763000,,,,
10 Jan 2000,13.58,13.58,13.58,13.58,13500,,,,
11 Jan 2000,14.66,14.66,13.40,13.47,1694220,,,,
12 Jan 2000,13.66,13.99,13.20,13.54,519164,,,,
13 Jan 2000,13.67,13.87,13.54,13.80,278400,,,,
14 Jan 2000,13.84,13.99,13.30,13.50,718814,,,,
I tried the following which loads the data.
df = read_csv('500008.csv', parse_dates=[0,1,2], usecols=range(6),
header=None, index_col=0)
But now I want to name the columns to be named. So, I tried,
df = read_csv('500008.csv', parse_dates=[0,1,2], usecols=range(6),
header=None, index_col=0, names='d o h l c v'.split())
but this fails saying,
IndexError: list index out of range
Can someone point out what I am doing wrong?
Upvotes: 2
Views: 4433
Reputation: 2751
This is a bug. I had the same problem and came up with two workarounds and have submitted a pull request to fix it.
Upvotes: 0
Reputation: 15345
I don't know if its a bug or a feature but you have to specify names for all columns present even if you specify just a subset of columns to usecols
df = pd.read_csv(StringIO(raw),
parse_dates=True,
header=None,
index_col=0,
usecols=[0,1,2,3,4,5],
names='0 1 2 3 4 5 6 7 8 9'.split())
which gives
1 2 3 4 5
0
2000-01-03 12.85 13.11 12.74 13.11 976500
2000-01-04 13.54 13.60 12.56 13.33 2493000
2000-01-05 12.68 13.34 12.37 12.68 1680000
I figured this by trying the edge case where you specify a full list to both names
and usecols
and tried then to gradually reduce and see what happens.
What is weired is the error message you get when you try for instance usecols=[1,2,3]
and names=['1','2','3']
:
ValueError: Passed header names mismatches usecols
which does not make sense...
Upvotes: 5