Reputation: 11
I am trying to load an xls file and convert to xlsx from an Amazon S3 bucket.
really appreciate if someone knows how to do:
Constrains: 1.This is an enterprise projet thus cannot be downloading the file to my local and convert and reupload 2. our internal application is running on EC2 linux, thus cannot install packages that works for windows like win32.client to do the conversion.
thank you!!
Upvotes: 0
Views: 1300
Reputation: 946
Try following: When you run it inside AWS: Python: How to read and load an excel file from AWS S3?
When your run it local:
Convert xls to xlsx:
def xls_to_xlsx(*args, **kw):
book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw)
book_xlsx = openpyxl.workbook.Workbook()
sheet_names = book_xls.sheet_names()
for sheet_index in range(len(sheet_names)):
sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index])
if sheet_index == 0:
sheet_xlsx = book_xlsx.active
sheet_xlsx.title = sheet_names[sheet_index]
else:
sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index])
for crange in sheet_xls.merged_cells:
rlo, rhi, clo, chi = crange
sheet_xlsx.merge_cells(start_row=rlo + 1, end_row=rhi,
start_column=clo + 1, end_column=chi, )
def _get_xlrd_cell_value(cell):
value = cell.value
if cell.ctype == xlrd.XL_CELL_DATE:
if value == 1.0:
datetime_tup = (0, 0, 0)
else:
datetime_tup = xlrd.xldate_as_tuple(value, 0)
if datetime_tup[0:3] == (0, 0, 0):
value = datetime.time(*datetime_tup[3:])
else:
value = datetime.datetime(*datetime_tup)
return value
for row in range(sheet_xls.nrows):
sheet_xlsx.append((
_get_xlrd_cell_value(cell)
for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row))
))
return book_xlsx
Upload it back to S3:
Upvotes: 1