Kaster
Kaster

Reputation: 427

(Polars) How to get element from a column with list by index specified in another column

I have a dataframe with 2 columns, where first column contains lists, and second column integer indexes.

How to get elements from first column by index specified in second column? Or even better, put that element in 3rd column.

Input example

df = pl.DataFrame({
    "lst": [[1, 2, 3], [4, 5, 6]], 
    "ind": [1, 2]
})
┌───────────┬─────┐
│ lst       ┆ ind │
│ ---       ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪═════╡
│ [1, 2, 3] ┆ 1   │
│ [4, 5, 6] ┆ 2   │
└───────────┴─────┘

Expected output.

res = df.with_columns(pl.Series("list[ind]", [2, 6]))
┌───────────┬─────┬───────────┐
│ lst       ┆ ind ┆ list[ind] │
│ ---       ┆ --- ┆ ---       │
│ list[i64] ┆ i64 ┆ i64       │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2         │
│ [4, 5, 6] ┆ 2   ┆ 6         │
└───────────┴─────┴───────────┘

Thanks.

Upvotes: 9

Views: 10981

Answers (4)

Hericks
Hericks

Reputation: 10039

Selecting list elements by index (from another column), polars provides pl.Expr.list.get. You can even pass the index column directly by name.

df.with_columns(
    pl.col("lst").list.get("ind").alias("list[ind]")
)
shape: (2, 3)
┌───────────┬─────┬───────────┐
│ lst       ┆ ind ┆ list[ind] │
│ ---       ┆ --- ┆ ---       │
│ list[i64] ┆ i64 ┆ i64       │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2         │
│ [4, 5, 6] ┆ 2   ┆ 6         │
└───────────┴─────┴───────────┘

Upvotes: 3

cccs31
cccs31

Reputation: 188

Update: This can now be done more easily by

df.with_columns(pl.col("lst").list.get(pl.col("ind")).alias("list[ind]"))

Original answer

You can use with_row_index() to add a row index column for grouping, then explode() the list so each list element is on each row. Then call gather() over the row index column using over() to select the element from the subgroup.

df = pl.DataFrame({"lst": [[1, 2, 3], [4, 5, 6]], "ind": [1, 2]})

df = (
    df.with_row_index()
    .with_columns(
        pl.col("lst").explode().gather(pl.col("ind")).over(pl.col("index")).alias("list[ind]")
    )
    .drop("index")
)
shape: (2, 3)
┌───────────┬─────┬───────────┐
│ lst       ┆ ind ┆ list[ind] │
│ ---       ┆ --- ┆ ---       │
│ list[i64] ┆ i64 ┆ i64       │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2         │
│ [4, 5, 6] ┆ 2   ┆ 6         │
└───────────┴─────┴───────────┘

Upvotes: 12

myamulla_ciencia
myamulla_ciencia

Reputation: 1488

Here is my approach:

Create a custom function to get the values as per the required index.

def get_elem(d):
    sel_idx = d[0]
    return d[1][sel_idx]

here is a test data.

df = pl.DataFrame({'lista':[[1,2,3],[4,5,6]],'idx':[1,2]})

Now lets create a struct on these two columns(it will create a dict) and apply an above function

df.with_columns(
    pl.struct('idx','lista').map_elements(lambda x: get_elem(list(x.values()))).alias('req_elem'))
shape: (2, 3)
┌───────────┬─────┬──────────┐
│ lista     ┆ idx ┆ req_elem │
│ ---       ┆ --- ┆ ---      │
│ list[i64] ┆ i64 ┆ i64      │
╞═══════════╪═════╪══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2        │
│ [4, 5, 6] ┆ 2   ┆ 6        │
└───────────┴─────┴──────────┘

Upvotes: 1

NedDasty
NedDasty

Reputation: 372

If your number of unique idx elements isn't absolutely massive, you can build a when/then expression to select based on the value of idx using list.get(idx):

import polars as pl

df = pl.DataFrame([{"lst": [1, 2, 3], "ind": 1}, {"lst": [4, 5, 6], "ind": 2}])

# create when/then expression for each unique index
idxs = df["ind"].unique()
ind, lst = pl.col("ind"), pl.col("lst") # makes expression generator look cleaner

expr = pl.when(ind == idxs[0]).then(lst.list.get(idxs[0]))
for idx in idxs[1:]:
    expr = expr.when(ind == idx).then(lst.list.get(idx))
expr = expr.otherwise(None)

df.select(expr)
shape: (2, 1)
┌─────┐
│ lst │
│ --- │
│ i64 │
╞═════╡
│ 2   │
│ 6   │
└─────┘

Upvotes: 1

Related Questions