Michael WS
Michael WS

Reputation: 2617

polars equivalent to pandas groupby shift()

Is there an equivalent way to to df.groupby().shift in polars? Use pandas.shift() within a group

Upvotes: 10

Views: 6577

Answers (1)

user18559875
user18559875

Reputation:

You can use the over expression to accomplish this in Polars. Using the example from the link...

import polars as pl

df = pl.DataFrame({
    "a": ["a", "a", "b", "b", "b"],
    "b": [1, 2, 3, 5, 3],
    "c": [5, 4, 3, 2, 1],
})

df.with_columns(
    pl.col("b").shift().over("a").alias("prev_value")
)
shape: (5, 4)
┌─────┬─────┬─────┬────────────┐
│ a   ┆ b   ┆ c   ┆ prev_value │
│ --- ┆ --- ┆ --- ┆ ---        │
│ str ┆ i64 ┆ i64 ┆ i64        │
╞═════╪═════╪═════╪════════════╡
│ a   ┆ 1   ┆ 5   ┆ null       │
│ a   ┆ 2   ┆ 4   ┆ 1          │
│ b   ┆ 3   ┆ 3   ┆ null       │
│ b   ┆ 5   ┆ 2   ┆ 3          │
│ b   ┆ 3   ┆ 1   ┆ 5          │
└─────┴─────┴─────┴────────────┘

To perform this on more than one column, you can specify the columns in the pl.col expression, and then use a prefix/suffix to name the new columns. For example:

df.with_columns(
    pl.col("b", "c").shift().over("a").name.prefix("prev_")
)
shape: (5, 5)
┌─────┬─────┬─────┬────────┬────────┐
│ a   ┆ b   ┆ c   ┆ prev_b ┆ prev_c │
│ --- ┆ --- ┆ --- ┆ ---    ┆ ---    │
│ str ┆ i64 ┆ i64 ┆ i64    ┆ i64    │
╞═════╪═════╪═════╪════════╪════════╡
│ a   ┆ 1   ┆ 5   ┆ null   ┆ null   │
│ a   ┆ 2   ┆ 4   ┆ 1      ┆ 5      │
│ b   ┆ 3   ┆ 3   ┆ null   ┆ null   │
│ b   ┆ 5   ┆ 2   ┆ 3      ┆ 3      │
│ b   ┆ 3   ┆ 1   ┆ 5      ┆ 2      │
└─────┴─────┴─────┴────────┴────────┘

Using multiple values with over

Let"s use this data.

df = pl.DataFrame(
    {
        "id": [1] * 5 + [2] * 5,
        "date": ["2020-01-01", "2020-01-01", "2020-02-01", "2020-02-01", "2020-02-01"] * 2,
        "value1": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        "value2": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
    }
).with_columns(pl.col("date").str.to_date())
df
shape: (10, 4)
┌─────┬────────────┬────────┬────────┐
│ id  ┆ date       ┆ value1 ┆ value2 │
│ --- ┆ ---        ┆ ---    ┆ ---    │
│ i64 ┆ date       ┆ i64    ┆ i64    │
╞═════╪════════════╪════════╪════════╡
│ 1   ┆ 2020-01-01 ┆ 1      ┆ 10     │
│ 1   ┆ 2020-01-01 ┆ 2      ┆ 20     │
│ 1   ┆ 2020-02-01 ┆ 3      ┆ 30     │
│ 1   ┆ 2020-02-01 ┆ 4      ┆ 40     │
│ 1   ┆ 2020-02-01 ┆ 5      ┆ 50     │
│ 2   ┆ 2020-01-01 ┆ 6      ┆ 60     │
│ 2   ┆ 2020-01-01 ┆ 7      ┆ 70     │
│ 2   ┆ 2020-02-01 ┆ 8      ┆ 80     │
│ 2   ┆ 2020-02-01 ┆ 9      ┆ 90     │
│ 2   ┆ 2020-02-01 ┆ 10     ┆ 100    │
└─────┴────────────┴────────┴────────┘

We can place a list of our grouping variables in the over expression (as well as a list in our pl.col expression). Polars will run them all in parallel.

df.with_columns(
    pl.col("value1", "value2").shift().over("id", "date").name.prefix("prev_"),
    pl.col("value1", "value2").diff().over("id", "date").name.suffix("_diff")
)
shape: (10, 8)
┌─────┬────────────┬────────┬────────┬─────────────┬─────────────┬─────────────┬─────────────┐
│ id  ┆ date       ┆ value1 ┆ value2 ┆ prev_value1 ┆ prev_value2 ┆ value1_diff ┆ value2_diff │
│ --- ┆ ---        ┆ ---    ┆ ---    ┆ ---         ┆ ---         ┆ ---         ┆ ---         │
│ i64 ┆ date       ┆ i64    ┆ i64    ┆ i64         ┆ i64         ┆ i64         ┆ i64         │
╞═════╪════════════╪════════╪════════╪═════════════╪═════════════╪═════════════╪═════════════╡
│ 1   ┆ 2020-01-01 ┆ 1      ┆ 10     ┆ null        ┆ null        ┆ null        ┆ null        │
│ 1   ┆ 2020-01-01 ┆ 2      ┆ 20     ┆ 1           ┆ 10          ┆ 1           ┆ 10          │
│ 1   ┆ 2020-02-01 ┆ 3      ┆ 30     ┆ null        ┆ null        ┆ null        ┆ null        │
│ 1   ┆ 2020-02-01 ┆ 4      ┆ 40     ┆ 3           ┆ 30          ┆ 1           ┆ 10          │
│ 1   ┆ 2020-02-01 ┆ 5      ┆ 50     ┆ 4           ┆ 40          ┆ 1           ┆ 10          │
│ 2   ┆ 2020-01-01 ┆ 6      ┆ 60     ┆ null        ┆ null        ┆ null        ┆ null        │
│ 2   ┆ 2020-01-01 ┆ 7      ┆ 70     ┆ 6           ┆ 60          ┆ 1           ┆ 10          │
│ 2   ┆ 2020-02-01 ┆ 8      ┆ 80     ┆ null        ┆ null        ┆ null        ┆ null        │
│ 2   ┆ 2020-02-01 ┆ 9      ┆ 90     ┆ 8           ┆ 80          ┆ 1           ┆ 10          │
│ 2   ┆ 2020-02-01 ┆ 10     ┆ 100    ┆ 9           ┆ 90          ┆ 1           ┆ 10          │
└─────┴────────────┴────────┴────────┴─────────────┴─────────────┴─────────────┴─────────────┘

Upvotes: 24

Related Questions