STGR
STGR

Reputation: 31

Convert .xlsb to .xlsx - Multi-sheet Microsoft Excel File

I have a multi-sheet excel file saved in a .xlsb format that I wish to covert to .xlsx to utilize the openpyxl library - code already written to support the same workbook that used to be .xlsx until macro-enabled, and wouldn't save in .xlsm.

I have managed to convert from .xlsb to .csv, but cannot convert any further and have hit roadblocks with various libraries due to various formatting errors.

As my file has multiple sheets (all tables) I only need to copy and paste the text on every sheet (keeping the sheet names) and get it to a .xlsx format.

For simplicity sake, imagine all I need to do is: get sheet names, access a sheet, determine max row/column, loop: copy and paste cell values, write to .xlsx with sheet name. With the starting file being .xlsb.

Any suggestion would be much appreciated.

Upvotes: 3

Views: 23970

Answers (6)

Joseelmax
Joseelmax

Reputation: 51

Nobody seems to actually reply to the question which is how to parse a MULTI-SHEET .xlsb file to .xlsx, OP even specified he needs to maintain sheet names and that's also what I was looking for.

Anyway, here you go:

First install pyxlsb

pip install pyxlsb

Then run the code

# Convert a multi-sheet .xlsb a .xlsx
import pandas as pd

excel_file = pd.ExcelFile('filename.xlsb', engine='pyxlsb')
sheet_dict = excel_file.parse(sheet_name=None, header=None)
writer = pd.ExcelWriter("filename.xlsx")
for sheet_name in sheet_dict:
    sheet_dict[sheet_name].to_excel(writer, sheet_name, index=False, header=False)
writer.close()

NOTE: This will not maintain color, formatting or cell sizes as it converts the file to dataframe before saving.

Upvotes: 1

bjek30d10
bjek30d10

Reputation: 166

Try the latest xlsb2xlsx package on PyPI:

pip install xlsb2xlsx
python -m xlsb2xlsx /filepath_with_xlsb_file

See https://pypi.org/project/xlsb2xlsx/ for more info.

Upvotes: 2

deepesh
deepesh

Reputation: 81

We can use Pandas, but we can lose some information from .xlsb (such as color, border, images, etc) Best way to convert .xlsb to .xlsx format without losing information is using aspose-cells Module in Python, it uses java in backend. We can install this module using pip install aspose-cells

# Use Aspose.Cells for Python via Java
# Install java runtime 64 bit
import jpype
import asposecells
jpype.startJVM()
from asposecells.api import *

# Open an excel file
workbook = Workbook(r"xlsb_filepath.xlsb")
workbook.save(r"xlsx_filepath.xlsx")

reference from:

https://blog.aspose.com/2021/05/28/convert-excel-to-image-in-python/#:~:text=Python%20Excel%20to%20Image%20Converter%20API%20In%20order,can%20install%20the%20API%20using%20the%20following%20command.

https://products.aspose.com/cells/cpp/conversion/xlsb-to-xlsx/

Upvotes: -1

user16810928
user16810928

Reputation: 1

I got copy code to test run,but that return error,above error . ValueError Traceback (most recent call last) in () ----> 1 df = pd.read_excel(r'C:\Users\l84193928\Desktop\test.xlsb', engine='pyxlsb')

D:\Users\l84193928\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs) 176 else: 177 kwargs[new_arg_name] = new_arg_value --> 178 return func(*args, **kwargs) 179 return wrapper 180 return _deprecate_kwarg

D:\Users\l84193928\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs) 176 else: 177 kwargs[new_arg_name] = new_arg_value --> 178 return func(*args, **kwargs) 179 return wrapper 180 return _deprecate_kwarg

D:\Users\l84193928\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\excel.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds) 305 306 if not isinstance(io, ExcelFile): --> 307 io = ExcelFile(io, engine=engine) 308 309 return io.parse(

D:\Users\l84193928\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\excel.py in init(self, io, **kwds) 367 368 if engine is not None and engine != 'xlrd': --> 369 raise ValueError("Unknown engine: {engine}".format(engine=engine)) 370 371 # If io is a url, want to keep the data as bytes so can't pass

ValueError: Unknown engine: pyxlsb

Upvotes: 0

Tommaso Fontana
Tommaso Fontana

Reputation: 730

Office answer

Newer version of Microsoft Office Excel, OpenOffice Calc or LibreOffice Calc can read xlsb.

So you'll just need to open the file in one of those and export it to xlsx

Reference

Python answer

CSV -> xlsx

This can be done easily with pandas, a python package.

import pandas as pd
df = pd.read_csv("/path/to/file.csv")
df.to_excel("/path/to/result.xlsx")

xlsb -> xlsx

As mentioned in this answer, pandas 1.0.0 added the support for binary excel files.

import pandas as pd
df = pd.read_excel("/path/to/file.xlsb", engine="pyxlsb")
df.to_excel("/path/to/result.xlsx")

In order to use it you must upgrade pandas, and install pyxlsb and 'openpyxl':

pip install pandas --upgrade
pip install pyxlsb

(this assumes the use of python 3 since python 2 reached End of Life)

If you need more control over the file you can directly use pyxlsb

Upvotes: 0

andmck
andmck

Reputation: 111

Just tested this.

Pandas now supports xlsb and can open these files using Glen Thompson's method described in Read XLSB File in Pandas Python:

import pandas as pd
df = pd.read_excel('path_to_file.xlsb', engine='pyxlsb')

And then you could use:

df.to_excel('path_to_file.xlsx')

Make sure you pip install pyxlsb, openpyxl and xlrd, I always forget.

Upvotes: 4

Related Questions