Dealing with columns type List[int] in python-polars

Question

Many times I find myself in a situation where I have a DataFrame and one column have the type of List[int].

For example, I have the following DF:

df = pl.DataFrame(
    {"group": ["A", "A", "B", "B", "B", "B"], 
     "value": [[3, 2, 5], [2,2,2], [2,5,9,4], [5,4,7,5,1], [9,4,5], [2,2]]}
)

Typically, Im using the explode and group_by methods in such situations.
However, when dealing with numerous columns, the code can become somewhat 'dirtier'.

To address this, I thought to use the map_elements method:

(
    df
    .group_by('group')
    .agg(
        (pl.col('value').map_elements(lambda l: pl.concat(l)))
        )
    .with_columns(
        pl.col('value').map_elements(lambda l: pl.Series.median(l))
    )
)

Unfortunately, this approach sacrifices the parallelization benefits that Polars offers. Also its execution is quite resource-costly. In cases where I have millions of rows, execution time can stretch from seconds to minutes.

Is there a better way to work with List[int]? Is there a good way to optimize my code?

Dealing with columns type List[int] in python-polars

Answers (1)

Related Questions