Reputation: 6450

Excel formula evaluation in Pandas

Say i have a dict like this whose key's are cell references in excel and value's are either formula or integer.

input_dict = {
    "B25": "=B2*B4",
    "C25": "=C2*C4",
    "D25": "=D2*D4",
    "E25": "=E2*E4",
    "F25": "=F2*F4",
    "G25": "=G2*G4",
    "B22": 0,
    "C22": "=SUM(B22:B28)",
    "D22": "=SUM(C22:C28)",
    "E22": "=SUM(D22:D28)",
    "F22": "=SUM(E22:E28)",
    "G22": "=SUM(F22:F28)",
    "B28": "=B19*B20*B22",
    "C28": "=C19*C20*C22",
    "D28": "=D19*D20*D22",
    "E28": "=E19*E20*E22",
    "F28": "=F19*F20*F22",
    "G28": "=G19*G20*G22",
    "B2": 1000,
    "C2": 900,
    "D2": 880,
    "E2": 860,
    "F2": 840,
    "G2": 800,
    "B4": 0.95,
    "C4": 0.90,
    "D4": 0.80,
    "E4": 0.80,
    "F4": 0.70,
    "G4": 0.60,
    "B19": 0.001,
    "C19": 0.001,
    "D19": 0.001,
    "E19": 0.001,
    "F19": 0.001,
    "G19": 0.002,
    "B20": 4,
    "C20": 3,
    "D20": 4,
    "E20": 4,
    "F20": 3,
    "G20": 4
  }

How to perform the calculation on this type of data.

My approach was to convert the dict to DataFrame and perform the calculations on it but i am stuck.

df = pd.DataFrame(list(input_dict.items()))

The output of df is shown below.

    0   1
0   G22 =SUM(F22:F28)
1   G4  0.6
2   F2  840
3   D2  880
4   C20 3
5   C4  0.9
6   B28 =B19*B20*B22
7   F25 =F2*F4
8   B25 =B2*B4
9   G25 =G2*G4
10  C28 =C19*C20*C22
11  G28 =G19*G20*G22
12  F22 =SUM(E22:E28)
13  C25 =C2*C4
14  B19 0.001
15  E4  0.8
16  D22 =SUM(C22:C28)
17  D4  0.8
18  G2  800
19  E28 =E19*E20*E22
20  D20 4
21  G20 4
22  E25 =E2*E4
23  F20 3
24  G19 0.002
25  E22 =SUM(D22:D28)
26  C2  900
27  D25 =D2*D4
28  E2  860
29  D28 =D19*D20*D22
30  C19 0.001
31  F28 =F19*F20*F22
32  B20 4
33  B2  1000
34  F4  0.7
35  E19 0.001
36  D19 0.001
37  B4  0.95
38  B22 0
39  F19 0.001
40  C22 =SUM(B22:B28)
41  E20 4

How to perform excel like calculations in Python ?

The expected output is as below

{
    "B25": "950",
    "C25": "810",
    "D25": "704",
    "E25": "688",
    "F25": "588",
    "G25": "480",
    "B22": 0,
    "C22": 950,
    "D22": 1757.15,
    "E22": 2454.1214,
    "F22": 3710.908,
    "G22": 4161.220736,
    "B28": 0,
    "C28": -2.85,
    "D28": -7.0286,
    "E28": -9.8164856,
    "F28": -9.396914743,
    "G28": -29.687264,
    "B2": 1000,
    "C2": 900,
    "D2": 880,
    "E2": 860,
    "F2": 840,
    "G2": 800,
    "B4": 0.95,
    "C4": 0.90,
    "D4": 0.80,
    "E4": 0.80,
    "F4": 0.70,
    "G4": 0.60,
    "B19": 0.001,
    "C19": 0.001,
    "D19": 0.001,
    "E19": 0.001,
    "F19": 0.001,
    "G19": 0.002,
    "B20": 4,
    "C20": 3,
    "D20": 4,
    "E20": 4,
    "F20": 3,
    "G20": 4
  }

Upvotes: 2

Answers (3)

bradbase

Reputation: 499

I have written a library, xlcalculator, that reads Excel files, interprets formulas with supported functions into Python and can subsequently evaluates the formulas.

An example if its use using the problem as you've stated;

input_dict = {
    "B4": 0.95,
    "B2": 1000,
    "B19": 0.001,
    "B20": 4,
    # B21
    "B22": 1,
    "B23": 2,
    "B24": 3,
    "B25": "=B2*B4",
    "B26": 5,
    "B27": 6,
    "B28": "=B19*B20*B22",
    "C22": "=SUM(B22:B28)",
  }

from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator

compiler = ModelCompiler()
my_model = compiler.read_and_parse_dict(input_dict)
evaluator = Evaluator(my_model)

for formula in my_model.formulae:
    print("Formula", formula, "evaluates to", evaluator.evaluate(formula))

# cells need a sheet and Sheet1 is default.
evaluator.set_cell_value("Sheet1!B22", 100)
print("Formula B28 now evaluates to", evaluator.evaluate("Sheet1!B28"))
print("Formula C22 now evaluates to", evaluator.evaluate("Sheet1!C22"))

produces this output;

Formula Sheet1!B25 evaluates to 950.0
Formula Sheet1!B28 evaluates to 0.004
Formula Sheet1!C22 evaluates to 967.004
Formula B28 now evaluates to 0.4
Formula C22 now evaluates to 1066.4

Upvotes: 4

Snow bunting

Reputation: 1221

You could use regex (regular expressions) and Python's eval function.

Let's assume we have

d = {'A1': '=A2+A3',
     'A2': '=SUM(A3:A5)',
     'A3': 3,
     'A4': 6,
     'A5': -1,
     ...}

the entire function would look like

import re
def g(s):
    """Excel-like evaluation with recurrence"""
    if isinstance(s,(int, float)):
        return s
    s=re.sub(r'=', '', s)
    s=re.sub(r'SUM\(([A-Z])([0-9]):([A-Z])([0-9])\)','sum([g(d[chr(i)+str(j)]) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)])',s)
    s=re.sub(r'([A-Z][0-9])',r'g(d["\1"])',s)
    return eval(s)

For example

>>> print(g(d['A1']))
11

Let's have a look at the single steps:

First of all we get rid of the =. One could alternatively write a test and only evaluate the formula if it starts with a =, up to the reader.
With re.sub(r'([A-Z][0-9])', r'g(d["\g<1>"])', any_string) one replaces a group with one capital letter and a number (e.x. 'A3') with the dictionary look-up of it (e.x. 'g(d["A3"])')
- we need to apply g() again if the new cell value is still a formula (recurrence)
- Note: If there are entries like 'A$3' or '$AB$4' as they occur in Excel, one could use r'$?([A-Z]+)$?([0-9])' as search pattern and r'd["\g<1>\g<2>"]' as substitution.
Then we can evaluate this string with eval(). Up to now one can use all implemented Python operations such as +, -, *, /, //, %, etc.
All other functions need to be implemented manually by substituting them with Python expressions. Her is an example for SUM(A3:B10):

With r'SUM$([A-Z])([0-9]):([A-Z])([0-9])$' we search for the formula.

[chr(i)+str(j) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)]] gives us all table indices that are in the sum. Then we apply g(d[...]) on each of them (recurrence) and take the sum.

This can be extended to any Excel formula of course.

Upvotes: 2

taleinat

Reputation: 8701

You'll need something to parse Excel formulas and convert them into a form which allows executing the calculations.

A quick search brings up pycel as the most promising Python library for this. It does not support all of Excel's functions and syntax, but it should probably support what you need, and it definitely supports the formulas in the example you posted.

Also see this answer to a similar SO question. As it mentions, you can also actually connect to Excel, have it execute all of the calculations, and then just read the results. One way to do this is using the win32com library, as detailed in the previously mentioned answer.

Upvotes: 0

Excel formula evaluation in Pandas

Answers (3)

Related Questions