Reputation: 129
I have a dataframe with a single column:
df = pl.DataFrame({
'a':[1,2,3,4]
})
shape: (4, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
│ 4 │
└─────┘
And a function from int to (int, int):
def f(i): return (i*10, i*100)
I want to use f
to add 2 columns b,c
to the data frame. In the process I want to explicitly set the datatype for those 2 columns (in reality they are not ints, could be a Polars struct, array, etc.)
I tried:
df.with_columns(pl.col('a').map_elements(f).alias('temp'))
But can't get this to work in general as f
is returning a tuple of complex dtypes.
┌─────┬───────────┐
│ a ┆ temp │
│ --- ┆ --- │
│ i64 ┆ object │
╞═════╪═══════════╡
│ 1 ┆ (10, 100) │
│ 2 ┆ (20, 200) │
│ 3 ┆ (30, 300) │
│ 4 ┆ (40, 400) │
└─────┴───────────┘
Desired result:
shape: (4, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 10 ┆ 100 │
│ 2 ┆ 20 ┆ 200 │
│ 3 ┆ 30 ┆ 300 │
│ 4 ┆ 40 ┆ 400 │
└─────┴─────┴─────┘
Upvotes: 1
Views: 794
Reputation: 18691
To answer the question as asked you can do this:
(
df
.with_columns(pl.col('a').map_elements(f).alias('temp'))
.with_columns(pl.col('temp').list.to_struct(fields=['b','c']))
.unnest('temp')
.with_columns(pl.col('b','c').cast(pl.Int32))
)
You can't chain the list.to_struct
in the same with_columns
as the map_elements. I think it has to do with map_elements looping on each element so it isn't, for lack of a better term, a fully formed column yet. If you want the output to be a specific dtype, you have to cast it after you unnest.
That being said, using map_elements is an anti-pattern and should be avoided wherever possible as it undoes all of the optimizations that make polars fast. Ideally you'd convert your f
into polars expressions. If there's no way to make f
into polars expressions but it can be vectorized then use map_batches
instead of map_elements
.
Upvotes: 0