psykeedelik
psykeedelik

Reputation: 623

usecols with parse_dates and names

I am trying to load a csv file with OHLC data in the following format.

In [49]: !head '500008.csv'
03 Jan 2000,12.85,13.11,12.74,13.11,976500,,,,
04 Jan 2000,13.54,13.60,12.56,13.33,2493000,,,,
05 Jan 2000,12.68,13.34,12.37,12.68,1680000,,,,
06 Jan 2000,12.60,13.30,12.27,12.34,2800500,,,,
07 Jan 2000,12.53,12.70,11.82,12.57,2763000,,,,
10 Jan 2000,13.58,13.58,13.58,13.58,13500,,,,
11 Jan 2000,14.66,14.66,13.40,13.47,1694220,,,,
12 Jan 2000,13.66,13.99,13.20,13.54,519164,,,,
13 Jan 2000,13.67,13.87,13.54,13.80,278400,,,,
14 Jan 2000,13.84,13.99,13.30,13.50,718814,,,,

I tried the following which loads the data.

df = read_csv('500008.csv', parse_dates=[0,1,2], usecols=range(6), 
                            header=None, index_col=0)

But now I want to name the columns to be named. So, I tried,

df = read_csv('500008.csv', parse_dates=[0,1,2], usecols=range(6),
                            header=None, index_col=0, names='d o h l c v'.split())

but this fails saying,

IndexError: list index out of range

Can someone point out what I am doing wrong?

Upvotes: 2

Views: 4433

Answers (2)

Weston
Weston

Reputation: 2751

This is a bug. I had the same problem and came up with two workarounds and have submitted a pull request to fix it.

Upvotes: 0

tzelleke
tzelleke

Reputation: 15345

I don't know if its a bug or a feature but you have to specify names for all columns present even if you specify just a subset of columns to usecols

df = pd.read_csv(StringIO(raw),
                 parse_dates=True,
                 header=None,
                 index_col=0,
                 usecols=[0,1,2,3,4,5],
                 names='0 1 2 3 4 5 6 7 8 9'.split())

which gives

                1      2      3      4        5
0                                              
2000-01-03  12.85  13.11  12.74  13.11   976500
2000-01-04  13.54  13.60  12.56  13.33  2493000
2000-01-05  12.68  13.34  12.37  12.68  1680000

I figured this by trying the edge case where you specify a full list to both names and usecols and tried then to gradually reduce and see what happens.

What is weired is the error message you get when you try for instance usecols=[1,2,3] and names=['1','2','3']:

ValueError: Passed header names mismatches usecols

which does not make sense...

Upvotes: 5

Related Questions