Reputation: 41

How to create a new column within a polars DataFrame that is equal to a list?

I am currently trying to create a new column within a polars dataframe (df). Within my df, there are many many rows, and within this new column I only want my existing list to populate wherever certain conditions are met.

I know that in pandas, I could do something like the following (where 'output' is my new column and assuming list is the same length as the filtered df) :

df = pd.DataFrame({
    'cond1': [-1, 0, 1, 2, 2, 3, 4, 5],
    'cond2': ['No', 'No', 'No', 'Yes','Yes', 'Yes', 'Yes', 'Yes']
})
conditions = ((df['cond1']>=1) & (df['cond2'] == 'Yes'))
list_new = [1,2,3,4,5]

df.loc[conditions, 'output'] = list_new

How would I do this same thing but within polars? I want the total number of rows to stay the same, I just want to update the new column wherever the conditions are met

I tried a few variations of the following but nothing has seemed to work:

df_polars = df_polars.with_columns(pl.when(conditions).then(list_new).alias('output')

Upvotes: 4

Answers (3)

jqurious

Reputation: 21580

You could use .cum_sum() to generate the indexes for .when the conditions are true.

condition = (pl.col.cond1 >= 1) & (pl.col.cond2 == 'Yes')

df.with_columns(idx = pl.when(condition).then(condition.cum_sum()) - 1)

shape: (8, 3)
┌───────┬───────┬──────┐
│ cond1 ┆ cond2 ┆ idx  │
│ ---   ┆ ---   ┆ ---  │
│ i64   ┆ str   ┆ u32  │
╞═══════╪═══════╪══════╡
│ -1    ┆ No    ┆ null │
│ 0     ┆ No    ┆ null │
│ 1     ┆ No    ┆ null │
│ 2     ┆ Yes   ┆ 0    │ # <-
│ 2     ┆ Yes   ┆ 1    │ # <-
│ 3     ┆ Yes   ┆ 2    │ # <-
│ 4     ┆ Yes   ┆ 3    │ # <-
│ 5     ┆ Yes   ┆ 4    │ # <-
└───────┴───────┴──────┘

Which can be given to .gather() to get the corresponding values.

condition = (pl.col.cond1 >= 1) & (pl.col.cond2 == 'Yes')
values = pl.lit(pl.Series('output', [10, 20, 30, 40, 50]))

df.with_columns(
   values.gather(
      pl.when(condition).then(condition.cum_sum()) - 1
   )
)

shape: (8, 3)
┌───────┬───────┬────────┐
│ cond1 ┆ cond2 ┆ output │
│ ---   ┆ ---   ┆ ---    │
│ i64   ┆ str   ┆ i64    │
╞═══════╪═══════╪════════╡
│ -1    ┆ No    ┆ null   │
│ 0     ┆ No    ┆ null   │
│ 1     ┆ No    ┆ null   │
│ 2     ┆ Yes   ┆ 10     │
│ 2     ┆ Yes   ┆ 20     │
│ 3     ┆ Yes   ┆ 30     │
│ 4     ┆ Yes   ┆ 40     │
│ 5     ┆ Yes   ┆ 50     │
└───────┴───────┴────────┘

Upvotes: 0

Cameron Riddell

Reputation: 13427

Can't avoid the use of with_row_index but one can also use DataFrame.update here

import polars as pl
from polars import col

df = pl.DataFrame({
    'cond1': [-1, 0, 1, 2, 2, 3, 4, 5],
    'cond2': ['No', 'No', 'No', 'Yes','Yes', 'Yes', 'Yes', 'Yes']
})
list_new = [1, 2, 3, 4, 5]

df = df.with_row_index()
subset = df.select(
   col('index').filter(col('cond1') >= 1, col('cond2') == 'Yes'),
   pl.Series('output', list_new)
)

print(
    df.with_columns(output=None)
    .update(subset, on='index', how='outer')
    .drop('index')
)
# shape: (8, 3)
# ┌───────┬───────┬────────┐
# │ cond1 ┆ cond2 ┆ output │
# │ ---   ┆ ---   ┆ ---    │
# │ i64   ┆ str   ┆ i64    │
# ╞═══════╪═══════╪════════╡
# │ -1    ┆ No    ┆ null   │
# │ 0     ┆ No    ┆ null   │
# │ 1     ┆ No    ┆ null   │
# │ 2     ┆ Yes   ┆ 1      │
# │ 2     ┆ Yes   ┆ 2      │
# │ 3     ┆ Yes   ┆ 3      │
# │ 4     ┆ Yes   ┆ 4      │
# │ 5     ┆ Yes   ┆ 5      │
# └───────┴───────┴────────┘

Upvotes: 2

roman

Reputation: 117540

.with_row index() to get row indexes.
.filter() to get rows based on conditions.

Now we get DataFrame with proper row indexes and new data:

conditions = [pl.col('cond1') >= 1, pl.col('cond2') == 'Yes']

df_output = (
    df.with_row_index()
    .filter(conditions).select('index', output=pl.Series(list_new)
)

┌───────┬────────┐
│ index ┆ output │
│ ---   ┆ ---    │
│ u32   ┆ i64    │
╞═══════╪════════╡
│ 3     ┆ 1      │
│ 4     ┆ 2      │
│ 5     ┆ 3      │
│ 6     ┆ 4      │
│ 7     ┆ 5      │
└───────┴────────┘

.join() to add column to original DataFrame:

(
    df.with_row_index()
    .join(df_output, on='index', how='left')
    .drop('index')
)

┌───────┬───────┬────────┐
│ cond1 ┆ cond2 ┆ output │
│ ---   ┆ ---   ┆ ---    │
│ i64   ┆ str   ┆ i64    │
╞═══════╪═══════╪════════╡
│ -1    ┆ No    ┆ null   │
│ 0     ┆ No    ┆ null   │
│ 1     ┆ No    ┆ null   │
│ 2     ┆ Yes   ┆ 1      │
│ 2     ┆ Yes   ┆ 2      │
│ 3     ┆ Yes   ┆ 3      │
│ 4     ┆ Yes   ┆ 4      │
│ 5     ┆ Yes   ┆ 5      │
└───────┴───────┴────────┘

Upvotes: 5

How to create a new column within a polars DataFrame that is equal to a list?

Answers (3)

Related Questions