IndexError when using split function

Question

Please help with my code

i am getting

IndexError: list index out of range

when i am using

split(",")[1] and split(",")[2]

This works fine instead

split(",")[0] and split(",")[-1]

appreciate your help

 my data like this:

  INPUT.csv
 col0  col1    col2    col3     col4
 blue,  eight,  line,  aaa     abc@123.com,xyz@123.com,ghi@123.com
 green, nine,   square, bbb    sdf@123.com,wef@123.com,hft@123.com


 expected output

 OUTPUT.csv
  col0  col1    col2    col3    col4          col5           col6
 blue    eight    line   aaa    abc@123.com   xyz@123.com    ghi@123.com
 green,  nine,   square, bbb     sdf@123.com   wef@123.com    hft@123.com

My code so far:

import csv

with open('INPUT.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)

with open('OUTPUT.csv', 'w',encoding='utf-8') as new_file:
fieldnames = ['col0','col1','col2','col3','col4','col5','col6']
csv_writer = csv.DictWriter(new_file,lineterminator='
' , 
fieldnames=fieldnames)
)

 for row in csv_reader:                    
 csv_writer.writerow({
    "col0": row["col0"],
    "col1": row["col1"],
    "col4": row["col4"].split(",")[0].strip(),
    "col5": row["col4"].split(",")[1].strip(),
    "col6": row["col4"].split(",")[2].strip(),  
   })

abarnert · Accepted Answer

You're reading the file as comma-separated values. So, look at this line:

green, nine,   square, bbb    sdf@123.com,wef@123.com,hft@123.com

The values, separated by commas, are:

green
 nine
   square
 bbb    sdf@123.com
wef@123.com
hft@123.com

So, your column 4 is wef@123.com. When you try to split that on commas, of course it doesn't have any, so you get back only one result, and then you ask for the second and third values that don't exist.

You need to fix your CSV file to actually be a CSV file.

That includes putting a comma after the bbb column, and after each column in the header.

And, more importantly, it means not using commas inside columns when you're using the same commas to separate the columns. The result is at best ambiguous, and therefore it can't be parsed.

Ways around this include:

Quote the strings with commas in them.
Escape the commas.
Use a different separator within the column.
Use a different separator between the columns.

(You could almost use ", " as a column delimiter here, but that's really hacky, and any human editing your file is going to break it.)

Here's an example that could work:

 col0,  col1,   col2,  col3,     col4
 blue,  eight,  line,  aaa,     abc@123.com,xyz@123.com,ghi@123.com
 green, nine,   square, bbb,    sdf@123.com,wef@123.com,hft@123.com

Even with all that messy spacing (that you always get from human-edited files), this can be parsed cleanly and unambiguously with the right dialect parameters:

csv_reader = csv.DictReader(csv_file, skipinitialspace=True)

Now, each row looks like this:

{'col0': 'blue',
 'col1': 'eight',
 'col2': 'line',
 'col3': 'aaa',
 'col4': 'abc@123.com,xyz@123.com,ghi@123.com'}

… so now, you can row["col4"].split(",") and get back:

['abc@123.com', 'xyz@123.com', 'ghi@123.com']

And then, [1] and [2] will work.

However, you still have at least one more problem in your code. Your desired output includes columns 2 and 3, but you're explicitly leaving them out of the writerow.

While we're at it, there's no reason to try to cram 7 lines of code into one expression. So, why not just split the row once?

col456 = row["col4"].split(",")

And then, we can just modify row in-place:

row["col4"], row["col5"], row["col6"] = col456

… and now:

csv_writer.writerow(row)

IndexError when using split function

Answers (2)

Related Questions