snl330
snl330

Reputation: 113

CSV Writer truncates characters in sequence in Excel 2013

I have an interesting situation with Python's csv module. I have a function that takes specific lines from a text file and writes them to csv file:

import os
import csv

def csv_save_use(textfile, csvfile):
    with open(textfile, "rb") as text:
        for line in text:
            line=line.strip()
            with open(csvfile, "ab") as f:            
                if line.startswith("# Online_Resource"):   
                    write = csv.writer(f, dialect='excel',
                                       delimiter='\t',
                                       lineterminator="\t",
                                       )
                    write.writerow([line.lstrip("# ")])

                if line.startswith("##"):
                    write = csv.writer(f, dialect='excel',
                                       delimiter='\t',
                                       lineterminator="\t",
                                       )
                    write.writerow([line.lstrip("# ")])

Here is a sample of some strings from the original text file:

# Online_Resource: https://www.ncdc.noaa.gov/
## Corg%    percent organic carbon,,,%,,paleoceanography,,,N

What is really bizarre is the final csv file looks good, except the characters in the first column only (those with the # originally) partially "overwrite" each other when I try to manually delete some characters from the cell:

Result

Oddly enough, too, there seems to be no formula to how the characters get jumbled each time I try to delete some after running the script. I tried encoding the csv file as unicode to no avail.

Thanks.

Upvotes: 1

Views: 693

Answers (2)

Jean-François Fabre
Jean-François Fabre

Reputation: 140266

You've selected excel dialect but you overrode it with weird parameters:

  • You're using TAB as separator and line terminator, which creates a 1-line CSV file. Close enough to "truncated" to me
  • Also quotechar shouldn't be a space.

This conveyed a nice side-effect as you noted: the csv module actually splits the lines according to commas!

The code is inefficient and error-prone: you're opening the file in append mode in the loop and create a new csv writer each time. Better done outside the loop.

Also, comma split must be done by hand now. So even better: use csv module to read the file as well. My fix proposal for your routine:

import os
import csv

def csv_save_use(textfile, csvfile):
  with open(textfile, "rU") as text, open(csvfile, "wb") as f:
    write = csv.writer(f, dialect='excel',
                       delimiter='\t')
    reader = csv.reader(text, delimiter=",")
    for row in reader:
        if not row:
            continue  # skip possible empty rows            
        if row[0].startswith("# Online_Resource"):
            write.writerow([row[0].lstrip("# ")])

        elif row[0].startswith("##"):
            write.writerow([row[0].lstrip("# ")]+row[1:]) # write row, stripping the first item from hashes

Note that the file isn't properly displayed in excel unless to remove delimiter='\t (reverts back to default comma)

Also note that you need to replace open(csvfile, "wb") as f by open(csvfile, "w",newline='') as f for Python 3.

here's how the output looks now (note that the empty cells are because there are several commas in a row)

enter image description here

Upvotes: 2

John Machin
John Machin

Reputation: 82992

more problems:

  1. line=line.strip(" ") removes leading and trailing spaces. It doesn't remove \r or \n ... try line=line.strip() which removes leading and trailing whitespace

  2. you get all your line including commas in one cell because you haven't split it up somehow ... like using a csv.reader instance. See here: https://docs.python.org/2/library/csv.html#csv.reader

  3. str.lstrip non-default arg is treated as a set of characters to be removed, so '## ' has the same effect as '# '. if guff.startswith('## ') then do guff = guff[3:] to get rid of the unwanted text

  4. It is not very clear at all what the sentence containing "bizarre" means. We need to see exactly what is in the output csv file. Create a small test file with 3 records (1) with '# Online_Resource' (2) with "## " (3) none of the above, run your code, and show the output, like this:

print repr(open('testout.csv', 'rb').read())

Upvotes: 1

Related Questions