Reputation: 6450
Say i have a dict like this whose key's are cell references in excel and value's are either formula or integer.
input_dict = {
"B25": "=B2*B4",
"C25": "=C2*C4",
"D25": "=D2*D4",
"E25": "=E2*E4",
"F25": "=F2*F4",
"G25": "=G2*G4",
"B22": 0,
"C22": "=SUM(B22:B28)",
"D22": "=SUM(C22:C28)",
"E22": "=SUM(D22:D28)",
"F22": "=SUM(E22:E28)",
"G22": "=SUM(F22:F28)",
"B28": "=B19*B20*B22",
"C28": "=C19*C20*C22",
"D28": "=D19*D20*D22",
"E28": "=E19*E20*E22",
"F28": "=F19*F20*F22",
"G28": "=G19*G20*G22",
"B2": 1000,
"C2": 900,
"D2": 880,
"E2": 860,
"F2": 840,
"G2": 800,
"B4": 0.95,
"C4": 0.90,
"D4": 0.80,
"E4": 0.80,
"F4": 0.70,
"G4": 0.60,
"B19": 0.001,
"C19": 0.001,
"D19": 0.001,
"E19": 0.001,
"F19": 0.001,
"G19": 0.002,
"B20": 4,
"C20": 3,
"D20": 4,
"E20": 4,
"F20": 3,
"G20": 4
}
How to perform the calculation on this type of data.
My approach was to convert the dict to DataFrame and perform the calculations on it but i am stuck.
df = pd.DataFrame(list(input_dict.items()))
The output of df is shown below.
0 1
0 G22 =SUM(F22:F28)
1 G4 0.6
2 F2 840
3 D2 880
4 C20 3
5 C4 0.9
6 B28 =B19*B20*B22
7 F25 =F2*F4
8 B25 =B2*B4
9 G25 =G2*G4
10 C28 =C19*C20*C22
11 G28 =G19*G20*G22
12 F22 =SUM(E22:E28)
13 C25 =C2*C4
14 B19 0.001
15 E4 0.8
16 D22 =SUM(C22:C28)
17 D4 0.8
18 G2 800
19 E28 =E19*E20*E22
20 D20 4
21 G20 4
22 E25 =E2*E4
23 F20 3
24 G19 0.002
25 E22 =SUM(D22:D28)
26 C2 900
27 D25 =D2*D4
28 E2 860
29 D28 =D19*D20*D22
30 C19 0.001
31 F28 =F19*F20*F22
32 B20 4
33 B2 1000
34 F4 0.7
35 E19 0.001
36 D19 0.001
37 B4 0.95
38 B22 0
39 F19 0.001
40 C22 =SUM(B22:B28)
41 E20 4
How to perform excel like calculations in Python ?
The expected output is as below
{
"B25": "950",
"C25": "810",
"D25": "704",
"E25": "688",
"F25": "588",
"G25": "480",
"B22": 0,
"C22": 950,
"D22": 1757.15,
"E22": 2454.1214,
"F22": 3710.908,
"G22": 4161.220736,
"B28": 0,
"C28": -2.85,
"D28": -7.0286,
"E28": -9.8164856,
"F28": -9.396914743,
"G28": -29.687264,
"B2": 1000,
"C2": 900,
"D2": 880,
"E2": 860,
"F2": 840,
"G2": 800,
"B4": 0.95,
"C4": 0.90,
"D4": 0.80,
"E4": 0.80,
"F4": 0.70,
"G4": 0.60,
"B19": 0.001,
"C19": 0.001,
"D19": 0.001,
"E19": 0.001,
"F19": 0.001,
"G19": 0.002,
"B20": 4,
"C20": 3,
"D20": 4,
"E20": 4,
"F20": 3,
"G20": 4
}
Upvotes: 2
Views: 7739
Reputation: 499
I have written a library, xlcalculator, that reads Excel files, interprets formulas with supported functions into Python and can subsequently evaluates the formulas.
An example if its use using the problem as you've stated;
input_dict = {
"B4": 0.95,
"B2": 1000,
"B19": 0.001,
"B20": 4,
# B21
"B22": 1,
"B23": 2,
"B24": 3,
"B25": "=B2*B4",
"B26": 5,
"B27": 6,
"B28": "=B19*B20*B22",
"C22": "=SUM(B22:B28)",
}
from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator
compiler = ModelCompiler()
my_model = compiler.read_and_parse_dict(input_dict)
evaluator = Evaluator(my_model)
for formula in my_model.formulae:
print("Formula", formula, "evaluates to", evaluator.evaluate(formula))
# cells need a sheet and Sheet1 is default.
evaluator.set_cell_value("Sheet1!B22", 100)
print("Formula B28 now evaluates to", evaluator.evaluate("Sheet1!B28"))
print("Formula C22 now evaluates to", evaluator.evaluate("Sheet1!C22"))
produces this output;
Formula Sheet1!B25 evaluates to 950.0
Formula Sheet1!B28 evaluates to 0.004
Formula Sheet1!C22 evaluates to 967.004
Formula B28 now evaluates to 0.4
Formula C22 now evaluates to 1066.4
Upvotes: 4
Reputation: 1221
You could use regex (regular expressions) and Python's eval
function.
Let's assume we have
d = {'A1': '=A2+A3',
'A2': '=SUM(A3:A5)',
'A3': 3,
'A4': 6,
'A5': -1,
...}
the entire function would look like
import re
def g(s):
"""Excel-like evaluation with recurrence"""
if isinstance(s,(int, float)):
return s
s=re.sub(r'=', '', s)
s=re.sub(r'SUM\(([A-Z])([0-9]):([A-Z])([0-9])\)','sum([g(d[chr(i)+str(j)]) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)])',s)
s=re.sub(r'([A-Z][0-9])',r'g(d["\1"])',s)
return eval(s)
For example
>>> print(g(d['A1']))
11
Let's have a look at the single steps:
=
. One could alternatively write a test and only evaluate the formula if it starts with a =
, up to the reader.With re.sub(r'([A-Z][0-9])', r'g(d["\g<1>"])', any_string)
one
replaces a group with one capital letter and a number (e.x. 'A3')
with the dictionary look-up of it (e.x. 'g(d["A3"])')
g()
again if the new cell value is still a formula (recurrence) r'$?([A-Z]+)$?([0-9])'
as search pattern and r'd["\g<1>\g<2>"]'
as substitution.Then we can evaluate this string with eval()
. Up to now one can use all implemented Python operations such as +, -, *, /, //, %, etc.
All other functions need to be implemented manually by substituting them with Python expressions. Her is an example for SUM(A3:B10)
:
With r'SUM\(([A-Z])([0-9]):([A-Z])([0-9])\)'
we search for the formula.
[chr(i)+str(j) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)]]
gives us all table indices that are in the sum.
Then we apply g(d[...])
on each of them (recurrence) and take the sum.
This can be extended to any Excel formula of course.
Upvotes: 2
Reputation: 8701
You'll need something to parse Excel formulas and convert them into a form which allows executing the calculations.
A quick search brings up pycel as the most promising Python library for this. It does not support all of Excel's functions and syntax, but it should probably support what you need, and it definitely supports the formulas in the example you posted.
Also see this answer to a similar SO question. As it mentions, you can also actually connect to Excel, have it execute all of the calculations, and then just read the results. One way to do this is using the win32com library, as detailed in the previously mentioned answer.
Upvotes: 0