Reputation: 41
I have a polars dataframe like this:
Users | 2021-04-27 | 2021-04-28 | 2021-04-29 | 2021-04-30 | ... |
---|---|---|---|---|---|
A | 2 | 2 | 5 | 6 | ... |
how can I do to have multi level for columns date and to get a excel file like this ? expected format
I already tried pandas.MultiIndex
.
Upvotes: 4
Views: 1074
Reputation: 21580
Pandas and Polars both use xlsxwriter
to write excel files.
(There are some nice examples in the docs: https://xlsxwriter.readthedocs.io/working_with_polars.html)
To combine cells , pandas uses xlsxwriter's .merge_range()
functionality.
It does not look like Polars currently offers a way to use this, it seems like you would have to discuss it as a feature request.
Some of the particular details of your example have been hardcoded, but the general idea could look something like this:
As @Hericks mentioned, in Polars you would usually reshape from wide-to-long e.g. with .unpivot()
import xlsxwriter
import polars as pl
df = pl.from_repr("""
┌───────┬────────────┬────────────┬────────────┬────────────┬────────────┐
│ Users ┆ 2021-04-27 ┆ 2021-04-28 ┆ 2021-04-29 ┆ 2021-04-30 ┆ 2021-03-01 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═══════╪════════════╪════════════╪════════════╪════════════╪════════════╡
│ A ┆ 2 ┆ 2 ┆ 5 ┆ 6 ┆ 1 │
└───────┴────────────┴────────────┴────────────┴────────────┴────────────┘
""")
df_long = (
df.unpivot(index="Users", variable_name="date")
.with_columns(
ym = pl.col("date").str.to_date().dt.to_string("%Y-%m"),
d = pl.col("date").str.to_date().dt.to_string("%d"),
)
.sort("ym") # to order columns (should be a proper "date sort" instead)
)
┌───────┬────────────┬───────┬─────────┬─────┐
│ Users ┆ date ┆ value ┆ ym ┆ d │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ str ┆ str │
╞═══════╪════════════╪═══════╪═════════╪═════╡
│ A ┆ 2021-03-01 ┆ 1 ┆ 2021-03 ┆ 01 │
│ A ┆ 2021-04-27 ┆ 2 ┆ 2021-04 ┆ 27 │
│ A ┆ 2021-04-28 ┆ 2 ┆ 2021-04 ┆ 28 │
│ A ┆ 2021-04-29 ┆ 5 ┆ 2021-04 ┆ 29 │
│ A ┆ 2021-04-30 ┆ 6 ┆ 2021-04 ┆ 30 │
└───────┴────────────┴───────┴─────────┴─────┘
levels = df_long.group_by("ym", maintain_order=True).agg("d")
shape: (2, 2)
┌─────────┬──────────────────────┐
│ ym ┆ d │
│ --- ┆ --- │
│ str ┆ list[str] │
╞═════════╪══════════════════════╡
│ 2021-03 ┆ ["01"] │
│ 2021-04 ┆ ["27", "28", … "30"] │
└─────────┴──────────────────────┘
Essentially, it just keeps track of the current offset/widths and any multi-width entries are written using .merge_range()
instead of .write()
with xlsxwriter.Workbook("polars-multi-level.xlsx") as book:
sheet = book.add_worksheet()
cell_format = book.add_format(dict(
bold = 1,
border = 1,
align = "center",
valign = "vcenter",
))
sheet.merge_range("A1:A2", "Users", cell_format)
levels = df_long.group_by("ym", maintain_order=True).agg("d")
row = 0
offset = 1 # col 0 already contains "Users"
for idx, (top_level, sub_levels) in enumerate(levels.iter_rows()):
data = top_level
first_col = offset + idx
width = len(sub_levels)
if width > 1:
last_col = offset + width
sheet.merge_range(
*[row, first_col, row, last_col, data, cell_format]
)
for sub_idx, sub_level in enumerate(sub_levels):
sheet.write(row + 1, first_col + sub_idx, sub_level, cell_format)
offset += width
else:
sheet.write(row, first_col, data, cell_format)
sheet.write(row + 1, first_col, sub_levels[0], cell_format)
Upvotes: 4