Reputation: 575
How can I achieve the equivalents of SQL's IN and NOT IN?
I have a list with the required values. Here's the scenario:
import pandas as pd
import polars as pl
exclude_fruit = ["apple", "orange"]
df = pl.DataFrame(
{
"A": [1, 2, 3, 4, 5, 6],
"fruits": ["banana", "banana", "apple", "apple", "banana", "orange"],
"B": [5, 4, 3, 2, 1, 6],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle", "frog"],
"optional": [28, 300, None, 2, -30, 949],
}
)
df.filter(~pl.select("fruits").str.contains(exclude_fruit))
df.filter(~pl.select("fruits").to_pandas().isin(exclude_fruit))
df.filter(~pl.select("fruits").isin(exclude_fruit))
Upvotes: 21
Views: 27446
Reputation:
You were close.
df.filter(~pl.col('fruits').is_in(exclude_fruit))
shape: (3, 5)
┌─────┬────────┬─────┬────────┬──────────┐
│ A ┆ fruits ┆ B ┆ cars ┆ optional │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ i64 │
╞═════╪════════╪═════╪════════╪══════════╡
│ 1 ┆ banana ┆ 5 ┆ beetle ┆ 28 │
│ 2 ┆ banana ┆ 4 ┆ audi ┆ 300 │
│ 5 ┆ banana ┆ 1 ┆ beetle ┆ -30 │
└─────┴────────┴─────┴────────┴──────────┘
Upvotes: 35
Reputation: 1129
Alternatively, you can also add .not_()
df.filter(pl.col('fruits').is_in(exclude_fruit).not_())
which is handy for node.js, where the tilde ~
doesn't work:
df.filter(pl.col('fruits').isIn(exclude_fruit).not())
Upvotes: 4