NicoAdrian
NicoAdrian

Reputation: 1044

xlsxwriter chart categories file size

The following code takes a CSV as input and output an XLSX containing a bar chart:

import pandas
import sys
import openpyxl
from openpyxl.chart import BarChart, Reference

# ---- Data preparation ----
path = sys.argv[1]
df = pandas.read_csv(path, sep = "\t")
df = df[pandas.notnull(df['response_time'])]
df.loc[:, "datetime"].replace(to_replace = r"(.*T)", value = r"", inplace = True, regex = True)
df = pandas.pivot(df.datetime, df.name + '-' + df.type, df.response_time)

# ---- Prepare writer, workbook and worksheet ----
writer = pandas.ExcelWriter("test.xlsx", engine = 'xlsxwriter')
df.to_excel(writer, sheet_name = "graph")
workbook = writer.book
worksheet = writer.sheets["graph"]

# ---- USING OPENPYXL ----
chart1 = BarChart()
chart1.x_axis_title = "DateTime"
chart1.x_axis_title = "Response time"
data = Reference(workbook.active, min_col = 2, min_row = 1, max_col = len(df.columns) + 1, max_row = workbook.active.max_row)
xval = Reference(workbook.active, min_col = 1, min_row = 2, max_row = workbook.active.max_row)
chart1.add_data(data, titles_from_data = True)
chart1.set_categories(xval)
workbook.active.add_chart(chart1, "D4")
workbook.save("test.xlsx")

# ---- USING XLSXWRITER ----
chart = workbook.add_chart({'type' : 'column'})
for i in range(1, len(df.columns) + 1):
    chart.add_series({
        'name' : ['graph', 0, i],
        'values' : ['graph', 1, i, len(df.index), i],
        'categories': ['graph', 1, 0, len(df.index), 0],
    })
chart.set_x_axis({'name' : 'DateTime'})
chart.set_y_axis({'name' : 'Response Time'})
worksheet.insert_chart('D4', chart)
writer.save()

Output graph is:

enter image description here

Both the openpyxl and xlsxwriter versions produce the exact same graph. The problem is that the xlsxwriter version output graph is 10MB but the openpyxl is only 3MB.

After investigating, I found out that it's the 'categories' entry in the xlsxwriter version that increases the weight considerably.

When I comment it, my X axis is now a serie from 1 to n but the output is only 1.6MB, how is that possible ? How can I make the output more lightweight using xlsxwriter ?

Upvotes: 1

Views: 387

Answers (1)

jmcnamara
jmcnamara

Reputation: 41644

The XlsxWriter file is probably bigger due to the fact that it stores the chart category and data values for the chart as well as for the worksheet, like Excel does. This gives better fidelity with Excel produced files, which is a goal of XlsxWriter, and also causes less issues when the charts are embedded in other applications, such as PowerPoint.

I’d guess from your analysis that openpyxl doesn’t store the additional data.

If you resave the XlsxWriter, or openpyxl, file in Excel I would expect that you end up with a file size that is approximately the same as the XlsxWriter file.

Upvotes: 1

Related Questions