Firefly
Firefly

Reputation: 459

Python xlsx to csv

I'm using this code to convert an excel to a csv file, the problem is that some fields have the text enclosed in double quotes and when it does the conversion, I think it is enclosing in double quotes the white spaces. How I can avoid this?.

import xlrd
import unicodecsv
import sys

def xls_to_csv (xls_filename, csv_filename):

    wb = xlrd.open_workbook(xls_filename)
    sh = wb.sheet_by_index(0)

    fh = open(csv_filename,"wb")
    csv_out = unicodecsv.writer(fh, encoding='utf-8', delimiter=';')

    for row_number in range (1,sh.nrows):
        csv_out.writerow(sh.row_values(row_number))

    fh.close()
xls_to_csv(sys.argv[1],sys.argv[2])

Excel file:

Excel file

Actual output:

enter image description here

My goal is a CSV without additional double quotes.

Upvotes: 0

Views: 1824

Answers (3)

John Y
John Y

Reputation: 14519

The "additional" quotes are not data per se. The outermost quotes (one on each end) are just string delimiters. These delimiters are mandatory in the Excel dialect of CSV when the actual data contains quotes, which yours does. Within string delimiters, actual quotes need to be "escaped" by doubling them. (The first unescaped quote will be interpreted as the closing string delimiter.)

It sounds like you are trying to write a custom dialect of CSV which is not intended to be used by Excel. So you have to adjust the parameters accordingly:

csv_out = unicodecsv.writer(
    fh,
    encoding='utf-8',
    delimiter=';',
    quoting=unicodecsv.QUOTE_NONE,
    quotechar=None)

Those settings will preserve existing data quote characters, but prevent adding any delimiter quote characters. (That is, the above settings will never add any delimiting quotes anywhere.) The rest of your code can stay exactly the same.

For more information, read Python 2's CSV docs carefully. (I assume you are using Python 2 because you are using unicodecsv. It is designed to mimic the csv module's interface as closely as possible. In fact, the recommended way to use it is to import it like this:

import unicodecsv as csv

so that the rest of your code looks exactly as though you are using Python's own csv, except for the possible encoding parameter. Incidentally, you can leave off that parameter if your desired encoding is UTF-8.)

Upvotes: 0

cheng chen
cheng chen

Reputation: 489

In case you want to do it in one function:

import xlrd
import unicodecsv
import sys

def xls_to_csv (xls_filename, csv_filename):

    wb = xlrd.open_workbook(xls_filename)
    sh = wb.sheet_by_index(0)

    fh = open(csv_filename,"wb")
    csv_out = unicodecsv.writer(fh, encoding='utf-8', delimiter=';')

    for row_number in range (1,sh.nrows):
        row = []
        row = [s.replace('"', '') for s in sh.row_values(row_number)]
        csv_out.writerow(row)

    fh.close()
xls_to_csv(sys.argv[1],sys.argv[2])

Upvotes: 0

Martin Evans
Martin Evans

Reputation: 46759

You could just strip any existing quotes from each cell before writing them as follows:

def xls_to_csv(xls_filename, csv_filename):
    wb = xlrd.open_workbook(xls_filename)
    sh = wb.sheet_by_index(0)

    with open(csv_filename,"wb") as fh:
        csv_out = unicodecsv.writer(fh, encoding='utf-8', delimiter=';')

        for row_number in range (1, sh.nrows):
            row = []
            for col in sh.row_values(row_number):
                try:
                    row.append(col.strip('"'))
                except AttributeError:
                    row.append(col)

            csv_out.writerow(row)

xls_to_csv(sys.argv[1],sys.argv[2]) 

Upvotes: 1

Related Questions