Map user-defined function on multiple polars columns

Question

I am doing a bit of data munging on a polars.Dataframe and I could write the same expression twice, but I would ideally like to cut down on that a bit. So I was thinking that I could just create a user-defined function that just plugs in the column names.

But, I know that polars tends to be a bit reluctant to let people bring in user-defined functions (and for good reasons), but it feels a bit tedious for me to write out the same expression over and over again, but with different columns.

So let's say that I have a polars dataframe like this:

import polars as pl
df = pl.DataFrame({
    'a':['Strongly Disagree', 'Disagree', 'Agree', 'Strongly Agree'],
    'b':['Strongly Agree', 'Agree', 'Disagree', 'Strongly Disagree'],
    'c':['Agree', 'Strongly Agree', 'Strongly Disagree', 'Disagree']
})

And, I could just use the when-then-otherwise expression to convert these three to numeric columns:

df_clean = df.with_columns(
    pl.when(
        pl.col('a') == 'Strongly Disagree'
    ).then(
        pl.lit(1)
    ).when(
        pl.col('a') == 'Disagree'
    ).then(
        pl.lit(2)
    ).when(
        pl.col('a') == 'Agree'
    ).then(
        pl.lit(3)
    ).when(
        pl.col('a') == 'Strongly Agree'
    ).then(
        pl.lit(4)
    )
)

But I don't want to write this out two more times.

So I was thinking, I could just write a function so then I could just map over a, b, and c, but this seems like it wouldn't work.

Anyone have any advice for the most efficient way to do this?

Wayoshi · Accepted Answer

See replace, which can be broadcast to whatever columns you want, and does the job succinctly:

df_clean = df.with_columns(
    pl.all().replace(
        {'Strongly Disagree': 1, 'Disagree': 2, 'Agree': 3, 'Strongly Agree': 4}
    )
)

shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 3   │
│ 2   ┆ 3   ┆ 4   │
│ 3   ┆ 2   ┆ 1   │
│ 4   ┆ 1   ┆ 2   │
└─────┴─────┴─────┘

If you want to rename the columns like in your follow-up answer, you certainly can with a similar approach:

columns_to_convert = [a,b,c]
new_column_names = [x,y,z]
md = {'Much worse' : -3, ...} # whatever values here

df_clean = df.with_columns(
    pl.col(old_col).replace(md).alias(new_col)
    for old_col, new_col in zip(columns_to_convert, new_column_names)
)

Map user-defined function on multiple polars columns

Answers (2)

Related Questions