seb2704
seb2704

Reputation: 540

Concatenate multiple columns into a list in a single column

I'd like to combine multiple columns as a list into a single column.

For example, this data frame:

import polars as pl
import numpy as np

df = pl.from_repr("""
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ 5   │
│ 3   ┆ 6   │
└─────┴─────┘
""")

into this one:

┌────────────┐
│ combine    │
│ ---        │
│ list [i64] │
╞════════════╡
│ [1, 4]     │
│ [2, 5]     │
│ [3, 6]     │
└────────────┘

Right now I'm doing it this way:

df = df.with_columns(pl.map_batches(['a','b'],lambda df:pl.Series(np.column_stack([df[0].to_numpy(),df[1].to_numpy()]).tolist())).alias('combine'))

Is there a better way to do it?

Upvotes: 3

Views: 2742

Answers (3)

ritchie46
ritchie46

Reputation: 14690

Update: reshape now returns a (fixed-width) Array type in Polars.

For lists, pl.concat_list("a", "b") can be used directly.


Original answer

With the landing of this PR, we can reshape a Series/Expr into a Series/Expr of type List. These can then be concatenated per row.

df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6]
})


df.select(
    pl.concat_list(
        pl.col("a").reshape((-1, 1)), 
        pl.col("b").reshape((-1, 1))
    )
)

Outputs:

shape: (3, 1)
┌────────────┐
│ a          │
│ ---        │
│ list [i64] │
╞════════════╡
│ [1, 4]     │
│ [2, 5]     │
│ [3, 6]     │
└────────────┘

Note that we give the shape (-1, 1), where -1 means infer the dimension size. So this reads as (infer the rows, 1 column).

You can compile polars from source to use this new feature, or wait a few days and then its landed on PyPi.

Upvotes: 3

Hericks
Hericks

Reputation: 9974

In modern polars, pl.concat_list can be used directly (without the need to reshape any columns).

import polars as pl

df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6],
})
df.select(pl.concat_list("a", "b", 2 * pl.col("b")))
shape: (3, 1)
┌────────────┐
│ a          │
│ ---        │
│ list[i64]  │
╞════════════╡
│ [1, 4, 8]  │
│ [2, 5, 10] │
│ [3, 6, 12] │
└────────────┘

Upvotes: 2

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

Try this:

df.apply(list, axis=1)

Here you can see an example:

>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df
   a  b
0  1  4
1  2  5
2  3  6
>>> df.apply(list, axis=1)
0    [1, 4]
1    [2, 5]
2    [3, 6]

Upvotes: 0

Related Questions