John Laudun
John Laudun

Reputation: 407

Python: IndexError: list index out of range (reading from CSV with 3 columns)

I am working on creating a stacked bar graph drawn from data in a CSV file. The data looks like this:

ANC-088,333,148
ANC-089,153,86
ANC-090,138,75

There more rows just like this.

The beginning script I have, just to start playing with bar graphs, looks like this:

from pylab import *

name = []
totalwords = []
uniquewords = []

readFile = open('wordstats-legends.csv', 'r').read()
eachLine = readFile.split('\n')

for line in eachLine:
    split = line.split(',')
    name.append(split[0])
    totalwords.append(split[1])
    uniquewords.append(int(split[2]))

pos = arange(len(name)) + 0.5
bar(pos, totalwords, align = 'center', color='red')
xticks(pos, name)

When I decided to see how things were going, I get the following error:

---> 13     totalwords.append(split[1])
IndexError: list index out of range

What am I not seeing and what are my first steps in fixing this? (Additional explanations most welcome as I continue to try to teach myself this stuff.)

Upvotes: 0

Views: 322

Answers (2)

Adalee
Adalee

Reputation: 538

If you are sure the whole file looks like you described, the problem will be the last newline (at the end of the file), where an empty string is inserted intoeachLine (you split the lines at the newline character and after the last newline there is nothing). So you only need to omit the last element in your eachline eg with eachLine.pop() after splitting.

If you would like a robust and general solution which takes care about every line that you can't split into three parts, you should use the solution from user1823. However, if the problem really is only what I have described above, checking for condition with splitting can slow you down for larger files.

Upvotes: 1

user1823
user1823

Reputation: 1111

Evidently this is a problem with your .csv, one or more of your lines does not contain the desired data. You can try to eliminate these lines as such:

eachLine = [item for item in readFile.split('\n') if len(item.split(',')) >= 3]

Like so:

from pylab import *

name = []
totalwords = []
uniquewords = []

readFile = open('wordstats-legends.csv', 'r').read()
eachLine = [item for item in readFile.split('\n') if len(item.split(',')) >= 3]

for line in eachLine:
    split = line.split(',')
    name.append(split[0])
    totalwords.append(split[1])
    uniquewords.append(int(split[2]))

pos = arange(len(name)) + 0.5
bar(pos, totalwords, align = 'center', color='red')
xticks(pos, name)

Upvotes: 1

Related Questions