How to load xls file from an Amazon S3 bucket and convert to xlsx and save to Amazon S3

I am trying to load an xls file and convert to xlsx from an Amazon S3 bucket.

really appreciate if someone knows how to do:

read xls file from s3
convert xls to xlsx and save in s3.

Constrains: 1.This is an enterprise projet thus cannot be downloading the file to my local and convert and reupload 2. our internal application is running on EC2 linux, thus cannot install packages that works for windows like win32.client to do the conversion.

thank you!!

Upvotes: 0

Answers (1)

droebi

Reputation: 946

Try following: When you run it inside AWS: Python: How to read and load an excel file from AWS S3?

When your run it local:

Download this file:

Convert xls to xlsx:

 def xls_to_xlsx(*args, **kw):
     book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw)
     book_xlsx = openpyxl.workbook.Workbook()

     sheet_names = book_xls.sheet_names()
     for sheet_index in range(len(sheet_names)):
         sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index])
         if sheet_index == 0:
             sheet_xlsx = book_xlsx.active
             sheet_xlsx.title = sheet_names[sheet_index]
         else:
             sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index])
         for crange in sheet_xls.merged_cells:
             rlo, rhi, clo, chi = crange
             sheet_xlsx.merge_cells(start_row=rlo + 1, end_row=rhi,
                                    start_column=clo + 1, end_column=chi, )

         def _get_xlrd_cell_value(cell):
             value = cell.value
             if cell.ctype == xlrd.XL_CELL_DATE:
                 if value == 1.0:
                     datetime_tup = (0, 0, 0)
                 else:
                     datetime_tup = xlrd.xldate_as_tuple(value, 0)

                 if datetime_tup[0:3] == (0, 0, 0):
                     value = datetime.time(*datetime_tup[3:])
                 else:
                     value = datetime.datetime(*datetime_tup)
             return value

         for row in range(sheet_xls.nrows):
             sheet_xlsx.append((
                 _get_xlrd_cell_value(cell)
                 for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row))
             ))

     return book_xlsx

Upload it back to S3:

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html

Upvotes: 1

How to load xls file from an Amazon S3 bucket and convert to xlsx and save to Amazon S3

Answers (1)

Related Questions