Reputation: 540
I'd like to combine multiple columns as a list into a single column.
For example, this data frame:
import polars as pl
import numpy as np
df = pl.from_repr("""
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
""")
into this one:
┌────────────┐
│ combine │
│ --- │
│ list [i64] │
╞════════════╡
│ [1, 4] │
│ [2, 5] │
│ [3, 6] │
└────────────┘
Right now I'm doing it this way:
df = df.with_columns(pl.map_batches(['a','b'],lambda df:pl.Series(np.column_stack([df[0].to_numpy(),df[1].to_numpy()]).tolist())).alias('combine'))
Is there a better way to do it?
Upvotes: 3
Views: 2742
Reputation: 14690
Update: reshape
now returns a (fixed-width) Array
type in Polars.
For lists, pl.concat_list("a", "b")
can be used directly.
Original answer
With the landing of this PR, we can reshape
a Series/Expr
into a Series/Expr
of type List
. These can then be concatenated
per row.
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6]
})
df.select(
pl.concat_list(
pl.col("a").reshape((-1, 1)),
pl.col("b").reshape((-1, 1))
)
)
Outputs:
shape: (3, 1)
┌────────────┐
│ a │
│ --- │
│ list [i64] │
╞════════════╡
│ [1, 4] │
│ [2, 5] │
│ [3, 6] │
└────────────┘
Note that we give the shape (-1, 1)
, where -1
means infer the dimension size. So this reads as (infer the rows, 1 column)
.
You can compile polars from source to use this new feature, or wait a few days and then its landed on PyPi.
Upvotes: 3
Reputation: 9974
In modern polars, pl.concat_list
can be used directly (without the need to reshape any columns).
import polars as pl
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
})
df.select(pl.concat_list("a", "b", 2 * pl.col("b")))
shape: (3, 1)
┌────────────┐
│ a │
│ --- │
│ list[i64] │
╞════════════╡
│ [1, 4, 8] │
│ [2, 5, 10] │
│ [3, 6, 12] │
└────────────┘
Upvotes: 2
Reputation: 15364
Try this:
df.apply(list, axis=1)
Here you can see an example:
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df
a b
0 1 4
1 2 5
2 3 6
>>> df.apply(list, axis=1)
0 [1, 4]
1 [2, 5]
2 [3, 6]
Upvotes: 0