Reputation: 23
I queried some data as pd.dataframe
from db and save it with pd.ExcelWriter
in xlsx.writer engine to a formatted xlsx template.
The saved data is then referenced in another sheet by dynamic formula and other formulas.
In the excel, there are two sheets : "Raw Data" and "Format Data":
"Raw Data" Stores the fetched data, "Format Data" contains the formula and reference cells from "Raw Data"
After the ExcelWriter processed the excel file, the dynamic formula in "Processed Data" changed to array formula, which denied the formula to expand. For example, there is a formula:
=FILTER(UNIQUE('Raw Data'!$I$4:$I$2000),UNIQUE('Raw Data'!$I$4:$I$2000)<>0)
which should return all unique data from I4:I2000.
However, the excel after ExcelWriter process changed the dynamic formula to array formula in range I4:I400 as the original template only have 396 unique values. Even if the updated data source have more than 396 unique values, it could only display 396 items.
It would be nice if someone could kindly advise the cause of this issue and provide a solution to stop the change of the dynamic formulas.
Remarks: I am trying not to do the data processing with python so please don't comment on that.
writer = pd.ExcelWriter(excel_path, engine='openpyxl', mode='a', if_sheet_exists="overlay",)
for config in excelUpdateConfigs:
result = fetchSQL(db_conn, config["sql"])
result = result.astype(config["dtype"])
result.to_excel(writer, sheet_name="Raw Data", float_format="%.5f", startrow=2, startcol=config["startcol"], header=True, index=False, engine="io.excel.xlsx.writer")
writer.close()
I have tried to do some research online but I can't see any related articles about dynamic formula right now.
I doubt that the low level logic of ExcelWriter is creating a new excel file based on the given path instead of editing on top of it so it breaks some of the equations and inserted items. Please advise if I'm wrong.
Upvotes: 2
Views: 225
Reputation: 1994
I tried to reproduce your problem using below code:
import pandas as pd
import numpy as np
excel_path = 'test.xlsx'
data = np.random.uniform(low=-10, high=10, size=400)
result = pd.DataFrame(data, columns=['close'])
writer = pd.ExcelWriter(excel_path, engine='openpyxl', mode='a', if_sheet_exists="overlay",)
result.to_excel(writer, sheet_name="Raw Data", float_format="%.5f", startrow=2, startcol=8, header=True, index=False, engine="io.excel.xlsx.writer")
writer.close()
test.xlsx has two tabs: "Raw Data", "Format Data". In format data cell A1, I input below
=FILTER(UNIQUE('Raw Data'!$I$4:$I$2000),UNIQUE('Raw Data'!$I$4:$I$2000)<>0)
After running the code, I saw filter function changed to this:
So I decided to change the way FILTER
is applied in "Format Data" sheet in the first place, that is, select A1:A2000
in "Format Data", input =FILTER(UNIQUE('Raw Data'!$I$4:$I$2000),UNIQUE('Raw Data'!$I$4:$I$2000)<>0)
in formula bar, press Ctrl + Shift + Enter.
This way openpyxl won't change it, and it will always accommodate 2000 datapoints.
Upvotes: 0