TheCrazySwede
TheCrazySwede

Reputation: 123

Convert XLSX to CSV without losing values from formulas

I've tried a few methods, including pandas:

df = pd.read_excel('file.xlsx')
df.to_csv('file.csv')

But every time I convert my xlsx file over to csv format, I lose all data within columns that include a formula. I have a formula that concatenates values from two other cells + '@domain' to create user emails, but this entire column returns blank in the csv product. The formula is basically this:

=CONCATENATE(B2,".",E2,"@domain")

The conversion is part of a larger code workflow, but it won't work if this column is left blank. The only thing I've tried that worked was this API, but I'd rather not pay a subscription if this can be done locally on the machine.

Any ideas? I'll try whatever you throw at me - bear in mind I'm new to this, but I will do my best!

Upvotes: 1

Views: 2024

Answers (1)

MegaMinx
MegaMinx

Reputation: 119

You can try to open the excel file with the openpyxl library in the data-only mode. This will prevent the raw formulas - they are going to be calculated just the way you see them in excel itself.

import openpyxl 

wb = openpyxl.load_workbook(filename, data_only=True)

Watch out when youre working with you original file and save it with the openpyxl-lib in the data-only-mode all your formulas will be lost. I had this once and it was horrible. So i recommend using a copy of your file to work with.

Since you have your xlsx-file with values only you can now use the internal csv library to generate a proper csv-file (idea from this post: How to save an Excel worksheet as CSV):

import csv

sheet = wb.active # was .get_active_sheet()
with open('test.csv', 'w', newline="") as f:
    c = csv.writer(f)
    for r in sheet.iter_rows(): # generator; was sh.rows
        c.writerow([cell.value for cell in r])

Upvotes: 1

Related Questions