Reputation: 427
I have a dataframe with 2 columns, where first column contains lists, and second column integer indexes.
How to get elements from first column by index specified in second column? Or even better, put that element in 3rd column.
Input example
df = pl.DataFrame({
"lst": [[1, 2, 3], [4, 5, 6]],
"ind": [1, 2]
})
┌───────────┬─────┐
│ lst ┆ ind │
│ --- ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪═════╡
│ [1, 2, 3] ┆ 1 │
│ [4, 5, 6] ┆ 2 │
└───────────┴─────┘
Expected output.
res = df.with_columns(pl.Series("list[ind]", [2, 6]))
┌───────────┬─────┬───────────┐
│ lst ┆ ind ┆ list[ind] │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ i64 │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1 ┆ 2 │
│ [4, 5, 6] ┆ 2 ┆ 6 │
└───────────┴─────┴───────────┘
Thanks.
Upvotes: 9
Views: 10981
Reputation: 10039
Selecting list elements by index (from another column), polars provides pl.Expr.list.get
. You can even pass the index column directly by name.
df.with_columns(
pl.col("lst").list.get("ind").alias("list[ind]")
)
shape: (2, 3)
┌───────────┬─────┬───────────┐
│ lst ┆ ind ┆ list[ind] │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ i64 │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1 ┆ 2 │
│ [4, 5, 6] ┆ 2 ┆ 6 │
└───────────┴─────┴───────────┘
Upvotes: 3
Reputation: 188
Update: This can now be done more easily by
df.with_columns(pl.col("lst").list.get(pl.col("ind")).alias("list[ind]"))
Original answer
You can use with_row_index()
to add a row index column for grouping, then explode()
the list so each list element is on each row. Then call gather()
over the row index column using over()
to select the element from the subgroup.
df = pl.DataFrame({"lst": [[1, 2, 3], [4, 5, 6]], "ind": [1, 2]})
df = (
df.with_row_index()
.with_columns(
pl.col("lst").explode().gather(pl.col("ind")).over(pl.col("index")).alias("list[ind]")
)
.drop("index")
)
shape: (2, 3)
┌───────────┬─────┬───────────┐
│ lst ┆ ind ┆ list[ind] │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ i64 │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1 ┆ 2 │
│ [4, 5, 6] ┆ 2 ┆ 6 │
└───────────┴─────┴───────────┘
Upvotes: 12
Reputation: 1488
Here is my approach:
Create a custom function to get the values as per the required index.
def get_elem(d):
sel_idx = d[0]
return d[1][sel_idx]
here is a test data.
df = pl.DataFrame({'lista':[[1,2,3],[4,5,6]],'idx':[1,2]})
Now lets create a struct on these two columns(it will create a dict) and apply an above function
df.with_columns(
pl.struct('idx','lista').map_elements(lambda x: get_elem(list(x.values()))).alias('req_elem'))
shape: (2, 3)
┌───────────┬─────┬──────────┐
│ lista ┆ idx ┆ req_elem │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ i64 │
╞═══════════╪═════╪══════════╡
│ [1, 2, 3] ┆ 1 ┆ 2 │
│ [4, 5, 6] ┆ 2 ┆ 6 │
└───────────┴─────┴──────────┘
Upvotes: 1
Reputation: 372
If your number of unique idx
elements isn't absolutely massive, you can build a when
/then
expression to select based on the value of idx
using list.get(idx)
:
import polars as pl
df = pl.DataFrame([{"lst": [1, 2, 3], "ind": 1}, {"lst": [4, 5, 6], "ind": 2}])
# create when/then expression for each unique index
idxs = df["ind"].unique()
ind, lst = pl.col("ind"), pl.col("lst") # makes expression generator look cleaner
expr = pl.when(ind == idxs[0]).then(lst.list.get(idxs[0]))
for idx in idxs[1:]:
expr = expr.when(ind == idx).then(lst.list.get(idx))
expr = expr.otherwise(None)
df.select(expr)
shape: (2, 1)
┌─────┐
│ lst │
│ --- │
│ i64 │
╞═════╡
│ 2 │
│ 6 │
└─────┘
Upvotes: 1