Satya
Satya

Reputation: 13

IndexError when using split function

Please help with my code

i am getting

IndexError: list index out of range 

when i am using

split(",")[1] and split(",")[2]

This works fine instead

split(",")[0] and split(",")[-1] 

appreciate your help

 my data like this:

  INPUT.csv
 col0  col1    col2    col3     col4
 blue,  eight,  line,  aaa     [email protected],[email protected],[email protected]
 green, nine,   square, bbb    [email protected],[email protected],[email protected]


 expected output

 OUTPUT.csv
  col0  col1    col2    col3    col4          col5           col6
 blue    eight    line   aaa    [email protected]   [email protected]    [email protected]
 green,  nine,   square, bbb     [email protected]   [email protected]    [email protected]

My code so far:

import csv

with open('INPUT.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)

with open('OUTPUT.csv', 'w',encoding='utf-8') as new_file:
fieldnames = ['col0','col1','col2','col3','col4','col5','col6']
csv_writer = csv.DictWriter(new_file,lineterminator='\n' , 
fieldnames=fieldnames)
)

 for row in csv_reader:                    
 csv_writer.writerow({
    "col0": row["col0"],
    "col1": row["col1"],
    "col4": row["col4"].split(",")[0].strip(),
    "col5": row["col4"].split(",")[1].strip(),
    "col6": row["col4"].split(",")[2].strip(),  
   })

Upvotes: 0

Views: 92

Answers (2)

abarnert
abarnert

Reputation: 365717

You're reading the file as comma-separated values. So, look at this line:

green, nine,   square, bbb    [email protected],[email protected],[email protected]

The values, separated by commas, are:

green
 nine
   square
 bbb    [email protected]
[email protected]
[email protected]

So, your column 4 is [email protected]. When you try to split that on commas, of course it doesn't have any, so you get back only one result, and then you ask for the second and third values that don't exist.


You need to fix your CSV file to actually be a CSV file.

That includes putting a comma after the bbb column, and after each column in the header.

And, more importantly, it means not using commas inside columns when you're using the same commas to separate the columns. The result is at best ambiguous, and therefore it can't be parsed.

Ways around this include:

  • Quote the strings with commas in them.
  • Escape the commas.
  • Use a different separator within the column.
  • Use a different separator between the columns.

(You could almost use ", " as a column delimiter here, but that's really hacky, and any human editing your file is going to break it.)


Here's an example that could work:

 col0,  col1,   col2,  col3,     col4
 blue,  eight,  line,  aaa,     [email protected],[email protected],[email protected]
 green, nine,   square, bbb,    [email protected],[email protected],[email protected]

Even with all that messy spacing (that you always get from human-edited files), this can be parsed cleanly and unambiguously with the right dialect parameters:

csv_reader = csv.DictReader(csv_file, skipinitialspace=True)

Now, each row looks like this:

{'col0': 'blue',
 'col1': 'eight',
 'col2': 'line',
 'col3': 'aaa',
 'col4': '[email protected],[email protected],[email protected]'}

… so now, you can row["col4"].split(",") and get back:

['[email protected]', '[email protected]', '[email protected]']

And then, [1] and [2] will work.


However, you still have at least one more problem in your code. Your desired output includes columns 2 and 3, but you're explicitly leaving them out of the writerow.

While we're at it, there's no reason to try to cram 7 lines of code into one expression. So, why not just split the row once?

col456 = row["col4"].split(",")

And then, we can just modify row in-place:

row["col4"], row["col5"], row["col6"] = col456

… and now:

csv_writer.writerow(row)

Upvotes: 4

DeepSpace
DeepSpace

Reputation: 81604

If string does not contain any ',' then string.split(',') will return a list with a single element, the entire string. In this case, string.split(',')[1] will obviously raiseIndexError.

li[0] == li[-1] in case li is a list with a single element.

Upvotes: 1

Related Questions