Reputation: 2040
I have written the following method that converts a .xlsx
file to a .txt
tab delimited file.
import sys
import xlrd
import csv
def xlsx_to_tab(self, inFile):
''' Convert an xlsx file to a tab delimited file. '''
excel_file = xlrd.open_workbook(inFile)
worksheet = excel_file.sheet_names()[0]
sh = excel_file.sheet_by_name(worksheet)
extentPos = inFile.rfind('.')
tab_file = open(inFile[:extentPos] + '.txt', 'w')
writetab = csv.writer(tab_file, delimiter='\t', quoting=csv.QUOTE_ALL)
for row in range(sh.nrows):
writetab.writerow(sh.row_values(row))
tab_file.close()
This method successfully converts between the file formats, however, what appears to be integers in the original excel file are output as floats in the .txt
file output.
I understand this may have to do with cell format settings in excel. That being said, I would like to handle this within this script, as I would like to explicitly define elements within the file as an integer with int()
, as opposed to checking and altering all the excel files before hand which would be tedious.
This method processes the file row by row. Is there a way to retrieve individual elements within each row so as to type cast them as an integer?
This question kind of addresses the issue:
But shows how to do it for one value at a time as opposed to a whole column of values.
To be a little more clear, I am trying to cast individual columns as integers but not all of them. So only some elements of each row need to be converted.
The numeric columns are columns 1,3, and 5
Upvotes: 0
Views: 635
Reputation: 20563
May also consider this to check if the cell is a float/integer/text and convert it using map with a custom function, like this:
def convert_int_or_string(n):
try:
float(n)
assert str(n).lower() != 'nan'
return int(float(n))
except (ValueError, AssertionError):
return n
test_row
['3.14',
'nan',
'12.143141234',
'hello',
'0.13989402028',
'world',
'0.26575558139',
'11',
'0.919189932407',
'!']
map(convert_int_or_string, test_row)
[3, 'nan', 12, 'hello', 0, 'world', 0, 11, 0, '!']
Add this convert_int_or_string
function in your module and change this line to:
writetab.writerow(map(convert_int_or_string, sh.row_values(row)))
Upvotes: 0
Reputation: 123481
You could do it like this:
for row in range(sh.nrows):
new_row = [int(x) if i in {1,3,5} else x for i, x in
enumerate(sh.row_values(row), start=1)]
writetab.writerow(new_row)
Upvotes: 1
Reputation: 4445
Use list comprehension to convert the elements of the row to int
before writing them to the new file:
for row in range(sh.nrows):
new_row = [int(x) for x in sh.row_values(row)]
writetab.writerow(new_row)
EDIT: with OP's new comments below, what you need to do (although it's not pretty) is:
int_columns = [1, 3, 5]
for row in range(sh.nrows):
new_row = sh.row_values(row)
for col in int_columns:
new_row[col] = int(new_row[col])
writetab.writerow(new_row)
And I would suggest you change your naming a bit. In the outermost loop, you're iterating over integers, but you name your variable row
. I'd write it like:
int_columns = [1, 3, 5]
for i in range(sh.nrows):
row = sh.row_values(i)
for col in int_columns:
row[col] = int(new_row[col])
writetab.writerow(row)
Upvotes: 1