Reputation: 3017
This code works and returns the expected result.
import polars as pl
df = pl.DataFrame({
'A':[1,2,3,3,2,1],
'B':[1,1,1,2,2,2]
})
(df
#.lazy()
.group_by('B')
.map_groups(lambda x:
x.with_columns(
pl.col("A").shift(i).alias(f"A_lag_{i}") for i in range(3)
),
#schema=None
)
.with_columns(
pl.col(f'A_lag_{i}') / pl.col('A') for i in range(3)
)
#.collect()
)
However, if you comment out the .lazy()
, schema=None
and .collect()
you get a ColumnNotFoundError: A_lag_0
I've tried a few versions of this code, but I can't entirely understand if I'm doing something wrong, or whether this is a bug in Polars.
Upvotes: 2
Views: 322
Reputation:
This doesn't address the error that you are receiving, but the more idiomatic way to express this in Polars is to use the over
expression. For example:
(
df
.lazy()
.with_columns(
pl.col("A").shift(i).over('B').alias(f"A_lag_{i}")
for i in range(3))
.with_columns(
(pl.col(f"A_lag_{i}") / pl.col("A")).name.suffix('_result')
for i in range(3))
.collect()
)
shape: (6, 8)
┌─────┬─────┬─────────┬─────────┬─────────┬────────────┬───────────┬───────────┐
│ A ┆ B ┆ A_lag_0 ┆ A_lag_1 ┆ A_lag_2 ┆ A_lag_0_re ┆ A_lag_1_r ┆ A_lag_2_r │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ sult ┆ esult ┆ esult │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ --- ┆ --- ┆ --- │
│ ┆ ┆ ┆ ┆ ┆ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════════╪═════════╪═════════╪════════════╪═══════════╪═══════════╡
│ 1 ┆ 1 ┆ 1 ┆ null ┆ null ┆ 1.0 ┆ null ┆ null │
│ 2 ┆ 1 ┆ 2 ┆ 1 ┆ null ┆ 1.0 ┆ 0.5 ┆ null │
│ 3 ┆ 1 ┆ 3 ┆ 2 ┆ 1 ┆ 1.0 ┆ 0.666667 ┆ 0.333333 │
│ 3 ┆ 2 ┆ 3 ┆ null ┆ null ┆ 1.0 ┆ null ┆ null │
│ 2 ┆ 2 ┆ 2 ┆ 3 ┆ null ┆ 1.0 ┆ 1.5 ┆ null │
│ 1 ┆ 2 ┆ 1 ┆ 2 ┆ 3 ┆ 1.0 ┆ 2.0 ┆ 3.0 │
└─────┴─────┴─────────┴─────────┴─────────┴────────────┴───────────┴───────────┘
Upvotes: 2