Bon
Bon

Reputation: 11

Is there a way to get the computed values of long formulae from XLSX file using Python

I am trying to read many XLSX files, and I don't care about the function, I just need the value it returns. The file is not and will not be modified, so getting the cache values in those cells is also fine for me.

I have read and tried many solutions on StackOverflow but none of them work:

  1. Some people say that pandas will read the value instead of formulae by default
df = pd.read_excel(filename)

I have tried it, but all cells with a formula in my Dataframe become nan

  1. I tried with openpyxl:

With data_only = True, all the cell is also nan

from openpyxl import load_workbook
wb = load_workbook(filename = f, data_only = True)

2.1. I also tried with data_only = False, and then compute the value with pycel

from openpyxl import load_workbook
wb = load_workbook(filename = f, data_only = True)
from pycel.excelcompiler import ExcelCompiler
cal = ExcelCompiler(f)
cal.evaluate(f"{name}!{cell}")

This one is nearly good enough for me since it can compute simple formula such as =IFERROR(E98/E78,"-"), but there's some complicated one such as =(SUM(AC38:AD38)/BH45)/(SUM(Q38:R38)/BD45)-1 I got a recursion error:

RecursionError: maximum recursion depth exceeded in comparison
Eval: vlookup(_C_("Key Metrics!C80"), _R_("YipitData Metrics (1P)!C8:AAT27"), match(_C_("Key Metrics!D7"), _R_("YipitData Metrics (1P)!C5:AAT5"), 0), 0) / vlookup("Implied JD Mall 1P GMV Net/Gross & Coverage Ratio", _R_("Historical Accuracy!B60:ADD70"), match(_C_("Key Metrics!D7"), _R_("Historical Accuracy!B58:ADD58"), 0), 0)

The full log of this error is really long, but all of them are During handling of the above exception, another exception occurred: and then another RecurssionError

2.2. Another attempt to compute the formula again with formulas but when I run this line of code: model = formulas.ExcelModel().load(filename).finish() I got the error:

Failed DISPATCHING 'CHAR' due to:
  NotImplementedError()

The log is much longer but I cannot include it all as StackOverflow says my post looks like a spam

  1. I also tried with xlrd, but the result is the same as pandas, all those cells become nan.

Read the file by openpyxl with the flag data_only=False and then calculate the value again with pycel is my best shot now, but it is not enough as it cannot read such long formula like I quoted above. Is there any change or another library that can fix my issue? Thank you!

Upvotes: 1

Views: 197

Answers (0)

Related Questions