buggsbunny4
buggsbunny4

Reputation: 181

polars equivalent of pandas.pivot_table.reindex

What is the polars equivalent of pandas.pivot_table.reindex? I am trying to convert my pandas code to polars and I hit a snag.

Modified example from polars user guide

import polars as pl
import pandas as pd

df = pl.DataFrame(
    {
        "foo": ["A", "A", "B", "B", "C"],
        "N": [1, 2, 2, 4, 2],
        "bar": ["k", "l", "n", "n", "o"],
    }
)

df_pl = df.pivot(index="foo", columns="bar", 
                 values="N", aggregate_function="max")
print (df_pl)

shape: (3, 5)
┌─────┬──────┬──────┬──────┬──────┐
│ foo ┆ k    ┆ l    ┆ n    ┆ o    │
│ --- ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  ┆ i64  ┆ i64  │
╞═════╪══════╪══════╪══════╪══════╡
│ A   ┆ 1    ┆ 2    ┆ null ┆ null │
│ B   ┆ null ┆ null ┆ 4    ┆ null │
│ C   ┆ null ┆ null ┆ null ┆ 2    │
└─────┴──────┴──────┴──────┴──────┘

equivalent code of pandas for the above polars pivot output

df_pd = pd.pivot_table(df.to_pandas(), 
                       values = "N", 
                       index = "foo", 
                       columns = ["bar"], 
                       aggfunc= 'max', 
                       sort=True)
print (df_pd)

bar    k    l    n    o
foo                    
A    1.0  2.0  NaN  NaN
B    NaN  NaN  4.0  NaN
C    NaN  NaN  NaN  2.0

Now, I would like to reindex the pivot table in certain order. As you can see, pandas added the new column with all NaN's as the data for column "m" doesn't exist in the original df.

reindex_list = ['k', 'm', 'l', 'n', 'o']
df_pd_reindexed = df_pd.reindex(reindex_list, axis = 1)
print (df_pd_reindexed)

bar    k   m    l    n    o
foo                        
A    1.0 NaN  2.0  NaN  NaN
B    NaN NaN  NaN  4.0  NaN
C    NaN NaN  NaN  NaN  2.0

How to get similar output using polars?

I tried the df.select method to order the columns, but it doesn't help in this scenario as the column is missing and polars throws an error which is obvious. Unless, I need to add the missing columns manually and populate them with nulls and then call the select method. I am looking for a simpler method than adding the missing columns manually.

df_pl = df_pl.select(["foo"] + reindex_list)

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python39\lib\site-packages\polars\dataframe\frame.py", line 8097, in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
  File "C:\Python39\lib\site-packages\polars\lazyframe\frame.py", line 1730, in collect
    return wrap_df(ldf.collect())
polars.exceptions.ColumnNotFoundError: m

Error originated just after this operation:
DF ["foo", "k", "l", "n"]; PROJECT */5 COLUMNS; SELECTION: "None"

Any guidance is much appreciated. Thanks

Upvotes: 4

Views: 233

Answers (1)

John Collins
John Collins

Reputation: 2961

Reindexing polars DataFrame columns (with auto-filling logic for missing columns)

One possible approach would be to add any needed new blank (i.e., all null) columns using pl.DataFrame.with_columns chained to a pl.DataFrame.select for enforcing specified columns order:

import polars as pl

df = pl.DataFrame(
    {
        "foo": ["A", "A", "B", "B", "C"],
        "N": [1, 2, 2, 4, 2],
        "bar": ["k", "l", "n", "n", "o"],
    }
)
print(df)

df_pl = df.pivot(
    index="foo", columns="bar", values="N", aggregate_function="max"
)
print(df_pl)

reindex_list = ["k", "m", "l", "n", "o"]

df_pl_reindexed = df_pl.with_columns(
    pl.lit(None).alias(c)
    for c in list(set(reindex_list) - set(df_pl.columns))
).select(["foo"] + reindex_list)

print(df_pl_reindexed)

gives:

shape: (5, 3)
┌─────┬─────┬─────┐
│ foo ┆ N   ┆ bar │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ A   ┆ 1   ┆ k   │
│ A   ┆ 2   ┆ l   │
│ B   ┆ 2   ┆ n   │
│ B   ┆ 4   ┆ n   │
│ C   ┆ 2   ┆ o   │
└─────┴─────┴─────┘
shape: (3, 5)
┌─────┬──────┬──────┬──────┬──────┐
│ foo ┆ k    ┆ l    ┆ n    ┆ o    │
│ --- ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  ┆ i64  ┆ i64  │
╞═════╪══════╪══════╪══════╪══════╡
│ A   ┆ 1    ┆ 2    ┆ null ┆ null │
│ B   ┆ null ┆ null ┆ 4    ┆ null │
│ C   ┆ null ┆ null ┆ null ┆ 2    │
└─────┴──────┴──────┴──────┴──────┘
shape: (3, 6)
┌─────┬──────┬──────┬──────┬──────┬──────┐
│ foo ┆ k    ┆ m    ┆ l    ┆ n    ┆ o    │
│ --- ┆ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ null ┆ i64  ┆ i64  ┆ i64  │
╞═════╪══════╪══════╪══════╪══════╪══════╡
│ A   ┆ 1    ┆ null ┆ 2    ┆ null ┆ null │
│ B   ┆ null ┆ null ┆ null ┆ 4    ┆ null │
│ C   ┆ null ┆ null ┆ null ┆ null ┆ 2    │
└─────┴──────┴──────┴──────┴──────┴──────┘

Upvotes: 1

Related Questions