Reputation: 171
I would like to add multiple columns at once to a Polars dataframe, where each column derives from the same object (for a row), by creating the object only once and then returning a method of that object for each column. Here is a simplified example using a range
object:
import polars as pl
df = pl.DataFrame({
'x': [11, 22],
})
def uses_object(x):
r = list(range(0, x))
c10 = r.count(10)
c12 = r.count(12)
return c10, c12
df = df.with_columns(
count_of_10 = pl.col('x').map_elements(lambda x: uses_object(x)[0]),
count_of_12 = pl.col('x').map_elements(lambda x: uses_object(x)[1]),
)
print(df)
shape: (2, 3)
┌─────┬─────────────┬─────────────┐
│ x ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════════════╪═════════════╡
│ 11 ┆ 1 ┆ 0 │
│ 22 ┆ 1 ┆ 1 │
└─────┴─────────────┴─────────────┘
I tried multiple assignment
df = df.with_columns(
count_of_10, count_of_12 = uses_object(pl.col('x')),
)
but got error
NameError
name 'count_of_10' is not defined.
Can I change the code to call uses_object
only once?
Upvotes: 3
Views: 1374
Reputation: 117540
You can use to_struct
and unnest()
to convert returned list to separate columns:
df.with_columns(
cnt=pl.col('x').map_elements(uses_object)
).with_columns(
pl.col('cnt').list.to_struct(fields=['count_of_10','count_of_12'])
).unnest('cnt')
┌─────┬─────────────┬─────────────┐
│ x ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════════════╪═════════════╡
│ 11 ┆ 1 ┆ 0 │
│ 22 ┆ 1 ┆ 1 │
└─────┴─────────────┴─────────────┘
Upvotes: 1
Reputation: 21580
If you return a dictionary from your function:
return dict(count_of_10=c10, count_of_12=c12)
You will get a struct column:
df.with_columns(
count = pl.col('x').map_elements(uses_object)
)
shape: (2, 2)
┌─────┬───────────┐
│ x ┆ count │
│ --- ┆ --- │
│ i64 ┆ struct[2] │
╞═════╪═══════════╡
│ 11 ┆ {1,0} │
│ 22 ┆ {1,1} │
└─────┴───────────┘
Which you can .unnest()
into individual columns.
df.with_columns(
count = pl.col('x').map_elements(uses_object)
).unnest('count')
shape: (2, 3)
┌─────┬─────────────┬─────────────┐
│ x ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════════════╪═════════════╡
│ 11 ┆ 1 ┆ 0 │
│ 22 ┆ 1 ┆ 1 │
└─────┴─────────────┴─────────────┘
As for your current approach, you would call it once and then use Polars list methods to extract the values in a separate .with_columns
/ .select
e.g.
df.with_columns(
count = pl.col('x').map_elements(uses_object)
).with_columns(
count_of_10 = pl.col('count').list.first(),
count_of_12 = pl.col('count').list.last(),
).drop('count')
Upvotes: 2