Reputation: 459
I'm using this code to convert an excel to a csv file, the problem is that some fields have the text enclosed in double quotes and when it does the conversion, I think it is enclosing in double quotes the white spaces. How I can avoid this?.
import xlrd
import unicodecsv
import sys
def xls_to_csv (xls_filename, csv_filename):
wb = xlrd.open_workbook(xls_filename)
sh = wb.sheet_by_index(0)
fh = open(csv_filename,"wb")
csv_out = unicodecsv.writer(fh, encoding='utf-8', delimiter=';')
for row_number in range (1,sh.nrows):
csv_out.writerow(sh.row_values(row_number))
fh.close()
xls_to_csv(sys.argv[1],sys.argv[2])
Excel file:
Actual output:
My goal is a CSV without additional double quotes.
Upvotes: 0
Views: 1824
Reputation: 14519
The "additional" quotes are not data per se. The outermost quotes (one on each end) are just string delimiters. These delimiters are mandatory in the Excel dialect of CSV when the actual data contains quotes, which yours does. Within string delimiters, actual quotes need to be "escaped" by doubling them. (The first unescaped quote will be interpreted as the closing string delimiter.)
It sounds like you are trying to write a custom dialect of CSV which is not intended to be used by Excel. So you have to adjust the parameters accordingly:
csv_out = unicodecsv.writer(
fh,
encoding='utf-8',
delimiter=';',
quoting=unicodecsv.QUOTE_NONE,
quotechar=None)
Those settings will preserve existing data quote characters, but prevent adding any delimiter quote characters. (That is, the above settings will never add any delimiting quotes anywhere.) The rest of your code can stay exactly the same.
For more information, read Python 2's CSV docs carefully. (I assume you are using Python 2 because you are using unicodecsv
. It is designed to mimic the csv
module's interface as closely as possible. In fact, the recommended way to use it is to import it like this:
import unicodecsv as csv
so that the rest of your code looks exactly as though you are using Python's own csv
, except for the possible encoding
parameter. Incidentally, you can leave off that parameter if your desired encoding is UTF-8.)
Upvotes: 0
Reputation: 489
In case you want to do it in one function:
import xlrd
import unicodecsv
import sys
def xls_to_csv (xls_filename, csv_filename):
wb = xlrd.open_workbook(xls_filename)
sh = wb.sheet_by_index(0)
fh = open(csv_filename,"wb")
csv_out = unicodecsv.writer(fh, encoding='utf-8', delimiter=';')
for row_number in range (1,sh.nrows):
row = []
row = [s.replace('"', '') for s in sh.row_values(row_number)]
csv_out.writerow(row)
fh.close()
xls_to_csv(sys.argv[1],sys.argv[2])
Upvotes: 0
Reputation: 46759
You could just strip any existing quotes from each cell before writing them as follows:
def xls_to_csv(xls_filename, csv_filename):
wb = xlrd.open_workbook(xls_filename)
sh = wb.sheet_by_index(0)
with open(csv_filename,"wb") as fh:
csv_out = unicodecsv.writer(fh, encoding='utf-8', delimiter=';')
for row_number in range (1, sh.nrows):
row = []
for col in sh.row_values(row_number):
try:
row.append(col.strip('"'))
except AttributeError:
row.append(col)
csv_out.writerow(row)
xls_to_csv(sys.argv[1],sys.argv[2])
Upvotes: 1