Des1303
Des1303

Reputation: 129

How to use a function returning a tuple to add multiple columns to data frame in Polars

I have a dataframe with a single column:

df = pl.DataFrame({
    'a':[1,2,3,4]
})
shape: (4, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
└─────┘

And a function from int to (int, int):

def f(i): return (i*10, i*100)

I want to use f to add 2 columns b,c to the data frame. In the process I want to explicitly set the datatype for those 2 columns (in reality they are not ints, could be a Polars struct, array, etc.)

I tried:

df.with_columns(pl.col('a').map_elements(f).alias('temp'))

But can't get this to work in general as f is returning a tuple of complex dtypes.

┌─────┬───────────┐
│ a   ┆ temp      │
│ --- ┆ ---       │
│ i64 ┆ object    │
╞═════╪═══════════╡
│ 1   ┆ (10, 100) │
│ 2   ┆ (20, 200) │
│ 3   ┆ (30, 300) │
│ 4   ┆ (40, 400) │
└─────┴───────────┘

Desired result:

shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 10  ┆ 100 │
│ 2   ┆ 20  ┆ 200 │
│ 3   ┆ 30  ┆ 300 │
│ 4   ┆ 40  ┆ 400 │
└─────┴─────┴─────┘

Upvotes: 1

Views: 794

Answers (1)

Dean MacGregor
Dean MacGregor

Reputation: 18691

To answer the question as asked you can do this:

(
    df
    .with_columns(pl.col('a').map_elements(f).alias('temp'))
    .with_columns(pl.col('temp').list.to_struct(fields=['b','c']))
    .unnest('temp')
    .with_columns(pl.col('b','c').cast(pl.Int32))
)

You can't chain the list.to_struct in the same with_columns as the map_elements. I think it has to do with map_elements looping on each element so it isn't, for lack of a better term, a fully formed column yet. If you want the output to be a specific dtype, you have to cast it after you unnest.

That being said, using map_elements is an anti-pattern and should be avoided wherever possible as it undoes all of the optimizations that make polars fast. Ideally you'd convert your f into polars expressions. If there's no way to make f into polars expressions but it can be vectorized then use map_batches instead of map_elements.

Upvotes: 0

Related Questions