JohnDole
JohnDole

Reputation: 565

Fast way to convert xlsx to csv with python

I have a tool where ssers can upload large xlsx files. We need to convert these xlsx files to csv for processing purposes. However, I did not found a fast way to convert a xlsx file to csv. We are not able to use a VBS Script (which was very fast). I tried various ways, like pandas, openpyxl:

pandas

read_file = pd.read_excel(os.path.join(path, old_filename), engine="openpyxl")
read_file.to_csv(os.path.join(path, new_filename), index=None, header=True)

openpyxl

wb = openpyxl.load_workbook(file, data_only=True)
sh = wb.active  # was .get_active_sheet()
with open(os.path.join(path, filename), 'w', newline="") as f:
    c = csv.writer(f)
    for r in sh.iter_rows():  # generator; was sh.rows
        c.writerow([cell.value for cell in r])

but a 60mb file of xlsx takes about 4 minutes to convert it to csv.

Is there a way which makes the convertion faster? I am open for any solution.

Upvotes: 0

Views: 1133

Answers (1)

Mahrkeenerh
Mahrkeenerh

Reputation: 1121

Do not iterate over all the cells, inside the rows, copy them whole.

for rownum in range(sh.nrows):
    c.writerow(sh.row_values(rownum))

https://stackoverflow.com/a/20105297/13000953

Upvotes: 1

Related Questions