CSV Writer truncates characters in sequence in Excel 2013

Question

I have an interesting situation with Python's csv module. I have a function that takes specific lines from a text file and writes them to csv file:

import os
import csv

def csv_save_use(textfile, csvfile):
    with open(textfile, "rb") as text:
        for line in text:
            line=line.strip()
            with open(csvfile, "ab") as f:            
                if line.startswith("# Online_Resource"):   
                    write = csv.writer(f, dialect='excel',
                                       delimiter='	',
                                       lineterminator="	",
                                       )
                    write.writerow([line.lstrip("# ")])

                if line.startswith("##"):
                    write = csv.writer(f, dialect='excel',
                                       delimiter='	',
                                       lineterminator="	",
                                       )
                    write.writerow([line.lstrip("# ")])

Here is a sample of some strings from the original text file:

# Online_Resource: https://www.ncdc.noaa.gov/
## Corg%    percent organic carbon,,,%,,paleoceanography,,,N

What is really bizarre is the final csv file looks good, except the characters in the first column only (those with the # originally) partially "overwrite" each other when I try to manually delete some characters from the cell:

Oddly enough, too, there seems to be no formula to how the characters get jumbled each time I try to delete some after running the script. I tried encoding the csv file as unicode to no avail.

Thanks.

Jean-Fran&#231;ois Fabre · Accepted Answer

You've selected excel dialect but you overrode it with weird parameters:

You're using TAB as separator and line terminator, which creates a 1-line CSV file. Close enough to "truncated" to me
Also quotechar shouldn't be a space.

This conveyed a nice side-effect as you noted: the csv module actually splits the lines according to commas!

The code is inefficient and error-prone: you're opening the file in append mode in the loop and create a new csv writer each time. Better done outside the loop.

Also, comma split must be done by hand now. So even better: use csv module to read the file as well. My fix proposal for your routine:

import os
import csv

def csv_save_use(textfile, csvfile):
  with open(textfile, "rU") as text, open(csvfile, "wb") as f:
    write = csv.writer(f, dialect='excel',
                       delimiter='	')
    reader = csv.reader(text, delimiter=",")
    for row in reader:
        if not row:
            continue  # skip possible empty rows            
        if row[0].startswith("# Online_Resource"):
            write.writerow([row[0].lstrip("# ")])

        elif row[0].startswith("##"):
            write.writerow([row[0].lstrip("# ")]+row[1:]) # write row, stripping the first item from hashes

Note that the file isn't properly displayed in excel unless to remove delimiter=' (reverts back to default comma)

Also note that you need to replace open(csvfile, "wb") as f by open(csvfile, "w",newline='') as f for Python 3.

here's how the output looks now (note that the empty cells are because there are several commas in a row)

CSV Writer truncates characters in sequence in Excel 2013

Answers (2)

Related Questions