Malonge
Malonge

Reputation: 2040

Convert Float to Integer During xcel to Tab Delimited File Conversion

I have written the following method that converts a .xlsx file to a .txt tab delimited file.

import sys
import xlrd
import csv

def xlsx_to_tab(self, inFile):
    ''' Convert an xlsx file to a tab delimited file. '''
    excel_file = xlrd.open_workbook(inFile)
    worksheet = excel_file.sheet_names()[0]
    sh = excel_file.sheet_by_name(worksheet)
    extentPos = inFile.rfind('.')
    tab_file = open(inFile[:extentPos] + '.txt', 'w')
    writetab = csv.writer(tab_file, delimiter='\t', quoting=csv.QUOTE_ALL)

    for row in range(sh.nrows):
        writetab.writerow(sh.row_values(row))

    tab_file.close()

This method successfully converts between the file formats, however, what appears to be integers in the original excel file are output as floats in the .txt file output.

I understand this may have to do with cell format settings in excel. That being said, I would like to handle this within this script, as I would like to explicitly define elements within the file as an integer with int(), as opposed to checking and altering all the excel files before hand which would be tedious.

This method processes the file row by row. Is there a way to retrieve individual elements within each row so as to type cast them as an integer?

This question kind of addresses the issue:

But shows how to do it for one value at a time as opposed to a whole column of values.

To be a little more clear, I am trying to cast individual columns as integers but not all of them. So only some elements of each row need to be converted.

The numeric columns are columns 1,3, and 5

Upvotes: 0

Views: 635

Answers (3)

Anzel
Anzel

Reputation: 20563

May also consider this to check if the cell is a float/integer/text and convert it using map with a custom function, like this:

def convert_int_or_string(n):
    try:
        float(n)
        assert str(n).lower() != 'nan'
        return int(float(n))
    except (ValueError, AssertionError):
        return n

test_row 
['3.14',
 'nan',
 '12.143141234',
 'hello',
 '0.13989402028',
 'world',
 '0.26575558139',
 '11',
 '0.919189932407',
 '!']

map(convert_int_or_string, test_row)
[3, 'nan', 12, 'hello', 0, 'world', 0, 11, 0, '!']

Add this convert_int_or_string function in your module and change this line to:

writetab.writerow(map(convert_int_or_string, sh.row_values(row)))

Upvotes: 0

martineau
martineau

Reputation: 123481

You could do it like this:

for row in range(sh.nrows):
    new_row = [int(x) if i in {1,3,5} else x for i, x in
        enumerate(sh.row_values(row), start=1)]
    writetab.writerow(new_row)

Upvotes: 1

dursk
dursk

Reputation: 4445

Use list comprehension to convert the elements of the row to int before writing them to the new file:

for row in range(sh.nrows):
    new_row = [int(x) for x in sh.row_values(row)]
    writetab.writerow(new_row)

EDIT: with OP's new comments below, what you need to do (although it's not pretty) is:

int_columns = [1, 3, 5]
for row in range(sh.nrows):
    new_row = sh.row_values(row)
    for col in int_columns:
        new_row[col] = int(new_row[col])
    writetab.writerow(new_row)

And I would suggest you change your naming a bit. In the outermost loop, you're iterating over integers, but you name your variable row. I'd write it like:

int_columns = [1, 3, 5]
for i in range(sh.nrows):
    row = sh.row_values(i)
    for col in int_columns:
        row[col] = int(new_row[col])
    writetab.writerow(row)

Upvotes: 1

Related Questions