polars equivalent of pandas.pivot_table.reindex

Question

What is the polars equivalent of pandas.pivot_table.reindex? I am trying to convert my pandas code to polars and I hit a snag.

Modified example from polars user guide

import polars as pl
import pandas as pd

df = pl.DataFrame(
    {
        "foo": ["A", "A", "B", "B", "C"],
        "N": [1, 2, 2, 4, 2],
        "bar": ["k", "l", "n", "n", "o"],
    }
)

df_pl = df.pivot(index="foo", columns="bar", 
                 values="N", aggregate_function="max")
print (df_pl)

shape: (3, 5)
┌─────┬──────┬──────┬──────┬──────┐
│ foo ┆ k    ┆ l    ┆ n    ┆ o    │
│ --- ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  ┆ i64  ┆ i64  │
╞═════╪══════╪══════╪══════╪══════╡
│ A   ┆ 1    ┆ 2    ┆ null ┆ null │
│ B   ┆ null ┆ null ┆ 4    ┆ null │
│ C   ┆ null ┆ null ┆ null ┆ 2    │
└─────┴──────┴──────┴──────┴──────┘

equivalent code of pandas for the above polars pivot output

df_pd = pd.pivot_table(df.to_pandas(), 
                       values = "N", 
                       index = "foo", 
                       columns = ["bar"], 
                       aggfunc= 'max', 
                       sort=True)
print (df_pd)

bar    k    l    n    o
foo                    
A    1.0  2.0  NaN  NaN
B    NaN  NaN  4.0  NaN
C    NaN  NaN  NaN  2.0

Now, I would like to reindex the pivot table in certain order. As you can see, pandas added the new column with all NaN's as the data for column "m" doesn't exist in the original df.

reindex_list = ['k', 'm', 'l', 'n', 'o']
df_pd_reindexed = df_pd.reindex(reindex_list, axis = 1)
print (df_pd_reindexed)

bar    k   m    l    n    o
foo                        
A    1.0 NaN  2.0  NaN  NaN
B    NaN NaN  NaN  4.0  NaN
C    NaN NaN  NaN  NaN  2.0

How to get similar output using polars?

I tried the df.select method to order the columns, but it doesn't help in this scenario as the column is missing and polars throws an error which is obvious. Unless, I need to add the missing columns manually and populate them with nulls and then call the select method. I am looking for a simpler method than adding the missing columns manually.

df_pl = df_pl.select(["foo"] + reindex_list)

Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Python39\lib\site-packages\polars\dataframe\frame.py", line 8097, in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
  File "C:\Python39\lib\site-packages\polars\lazyframe\frame.py", line 1730, in collect
    return wrap_df(ldf.collect())
polars.exceptions.ColumnNotFoundError: m

Error originated just after this operation:
DF ["foo", "k", "l", "n"]; PROJECT */5 COLUMNS; SELECTION: "None"

Any guidance is much appreciated. Thanks

polars equivalent of pandas.pivot_table.reindex

Answers (1)

Reindexing polars DataFrame columns (with auto-filling logic for missing columns)

Related Questions