bwooster
bwooster

Reputation: 171

Add multiple columns from one function call in Python Polars

I would like to add multiple columns at once to a Polars dataframe, where each column derives from the same object (for a row), by creating the object only once and then returning a method of that object for each column. Here is a simplified example using a range object:

import polars as pl

df = pl.DataFrame({
    'x': [11, 22],
})

def uses_object(x):
    r = list(range(0, x))
    c10 = r.count(10)
    c12 = r.count(12)
    return c10, c12

df = df.with_columns(
    count_of_10 = pl.col('x').map_elements(lambda x: uses_object(x)[0]),
    count_of_12 = pl.col('x').map_elements(lambda x: uses_object(x)[1]),
)

print(df)
shape: (2, 3)
┌─────┬─────────────┬─────────────┐
│ x   ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ ---         ┆ ---         │
│ i64 ┆ i64         ┆ i64         │
╞═════╪═════════════╪═════════════╡
│ 11  ┆ 1           ┆ 0           │
│ 22  ┆ 1           ┆ 1           │
└─────┴─────────────┴─────────────┘

I tried multiple assignment

df = df.with_columns(
    count_of_10, count_of_12 = uses_object(pl.col('x')),
)

but got error

NameError
name 'count_of_10' is not defined.

Can I change the code to call uses_object only once?

Upvotes: 3

Views: 1374

Answers (2)

roman
roman

Reputation: 117540

You can use to_struct and unnest() to convert returned list to separate columns:

df.with_columns(
    cnt=pl.col('x').map_elements(uses_object)
).with_columns(
    pl.col('cnt').list.to_struct(fields=['count_of_10','count_of_12'])
).unnest('cnt')

┌─────┬─────────────┬─────────────┐
│ x   ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ ---         ┆ ---         │
│ i64 ┆ i64         ┆ i64         │
╞═════╪═════════════╪═════════════╡
│ 11  ┆ 1           ┆ 0           │
│ 22  ┆ 1           ┆ 1           │
└─────┴─────────────┴─────────────┘

Upvotes: 1

jqurious
jqurious

Reputation: 21580

If you return a dictionary from your function:

return dict(count_of_10=c10, count_of_12=c12)

You will get a struct column:

df.with_columns(
   count = pl.col('x').map_elements(uses_object)
)
shape: (2, 2)
┌─────┬───────────┐
│ x   ┆ count     │
│ --- ┆ ---       │
│ i64 ┆ struct[2] │
╞═════╪═══════════╡
│ 11  ┆ {1,0}     │
│ 22  ┆ {1,1}     │
└─────┴───────────┘

Which you can .unnest() into individual columns.

df.with_columns(
   count = pl.col('x').map_elements(uses_object)
).unnest('count')
shape: (2, 3)
┌─────┬─────────────┬─────────────┐
│ x   ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ ---         ┆ ---         │
│ i64 ┆ i64         ┆ i64         │
╞═════╪═════════════╪═════════════╡
│ 11  ┆ 1           ┆ 0           │
│ 22  ┆ 1           ┆ 1           │
└─────┴─────────────┴─────────────┘

As for your current approach, you would call it once and then use Polars list methods to extract the values in a separate .with_columns / .select e.g.

df.with_columns(
   count = pl.col('x').map_elements(uses_object)
).with_columns(
   count_of_10 = pl.col('count').list.first(),
   count_of_12 = pl.col('count').list.last(),
).drop('count')

Upvotes: 2

Related Questions