Horseman
Horseman

Reputation: 427

How can I use when, then and otherwise with multiple conditions in polars?

I have a data set with three columns. Column A is to be checked for strings. If the string matches foo or spam, the values in the same row for the other two columns L and G should be changed to XX. For this I have tried the following.

df = pl.DataFrame(
    {
        "A": ["foo", "ham", "spam", "egg",],
        "L": ["A54", "A12", "B84", "C12"],
        "G": ["X34", "C84", "G96", "L6",],
    }
)
print(df)

shape: (4, 3)
┌──────┬─────┬─────┐
│ A    ┆ L   ┆ G   │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo  ┆ A54 ┆ X34 │
│ ham  ┆ A12 ┆ C84 │
│ spam ┆ B84 ┆ G96 │
│ egg  ┆ C12 ┆ L6  │
└──────┴─────┴─────┘

expected outcome

shape: (4, 3)
┌──────┬─────┬─────┐
│ A    ┆ L   ┆ G   │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo  ┆ XX  ┆ XX  │
│ ham  ┆ A12 ┆ C84 │
│ spam ┆ XX  ┆ XX  │
│ egg  ┆ C12 ┆ L6  │
└──────┴─────┴─────┘

I tried this

df = df.with_columns(
    pl.when((pl.col("A") == "foo") | (pl.col("A") == "spam"))
    .then((pl.col("L")= "XX") & (pl.col( "G")= "XX"))
    .otherwise((pl.col("L"))&(pl.col( "G")))
)

However, this does not work. Can someone help me with this?

Upvotes: 9

Views: 8419

Answers (1)

jqurious
jqurious

Reputation: 21534

For setting multiple columns to the same value you could use:

df.with_columns(
   pl.when(pl.col("A").is_in(["foo", "spam"]))
     .then(pl.lit("XX"))
     .otherwise(pl.col("L", "G"))
     .name.keep()
)
shape: (4, 3)
┌──────┬─────┬─────┐
│ A    ┆ L   ┆ G   │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo  ┆ XX  ┆ XX  │
│ ham  ┆ A12 ┆ C84 │
│ spam ┆ XX  ┆ XX  │
│ egg  ┆ C12 ┆ L6  │
└──────┴─────┴─────┘
  • .is_in() can be used instead of multiple == x | == y chains

If you need different values, you can pack them into a struct and extract/unnest the fields.

df.with_columns(
   pl.when(pl.col("A").is_in(["foo", "spam"]))
     .then(pl.struct(L = pl.lit("AAA"), G = pl.lit("BBB")))
     .otherwise(pl.struct("L", "G"))
     .struct.field("L", "G") # or .struct.unnest()
)
shape: (4, 3)
┌──────┬─────┬─────┐
│ A    ┆ L   ┆ G   │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo  ┆ AAA ┆ BBB │
│ ham  ┆ A12 ┆ C84 │
│ spam ┆ AAA ┆ BBB │
│ egg  ┆ C12 ┆ L6  │
└──────┴─────┴─────┘

Upvotes: 10

Related Questions