Reputation: 3182
I have a dataframe with many columns that have occurances of inf
. I'd like to replace these with null
. All of the column names in question start with the string "ratio_".
This is what I've tried, but I get new columns with the title "literal", when I would like to replace the old ones:
import polars as pl
import numpy as np
df = pl.DataFrame({"label":["a", "b", "c"], "ratio_a":[1.,2.,np.inf]})
df.with_columns(
pl.when(pl.col("^ratio_\w+$").is_infinite())
.then(None)
.otherwise(pl.col("^ratio_\w+$")
)
)
shape: (3, 3)
┌───────┬─────────┬─────────┐
│ label ┆ ratio_a ┆ literal │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞═══════╪═════════╪═════════╡
│ a ┆ 1.0 ┆ 1.0 │
│ b ┆ 2.0 ┆ 2.0 │
│ c ┆ inf ┆ null │
└───────┴─────────┴─────────┘
I could solve this by looping over the column names instead, but I was surprised at the above behaviour. Is there a way to make the above expression work?
import numpy as np
df = pl.DataFrame({"label":["a", "b", "c"], "ratio_a":[1.,2.,np.inf]})
for col in df.columns:
if col.startswith("ratio_"):
df = df.with_columns(
pl.when(pl.col(col).is_infinite())
.then(None)
.otherwise(pl.col(col)
).alias(col)
)
df
Upvotes: 5
Views: 4059
Reputation: 21590
Expr.replace()
has since been added, which may also be an option.
df.with_columns(
pl.col("^ratio\w+$").replace([float("inf"), float("-inf")], None)
)
shape: (3, 5)
┌───────┬─────────┬─────────┬─────────┬─────────┐
│ label ┆ ratio_a ┆ ratio_b ┆ ratio_c ┆ ratio_d │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪═════════╪═════════╪═════════╪═════════╡
│ a ┆ 1.0 ┆ 10.0 ┆ null ┆ -1.0 │
│ b ┆ 2.0 ┆ null ┆ 200.0 ┆ -2.0 │
│ c ┆ null ┆ 20.0 ┆ 300.0 ┆ null │
└───────┴─────────┴─────────┴─────────┴─────────┘
Upvotes: 2
Reputation:
You were on the right track. You can use the .name.keep()
expression.
Let's expand your example.
import polars as pl
import numpy as np
df = pl.DataFrame({"label": ["a", "b", "c"],
"ratio_a": [1.0, 2.0, np.inf],
"ratio_b": [10.0, np.inf, 20.0],
"ratio_c": [np.inf, 200, 300],
"ratio_d": [-1.0, -2.0, np.inf],
})
df
shape: (3, 5)
┌───────┬─────────┬─────────┬─────────┬─────────┐
│ label ┆ ratio_a ┆ ratio_b ┆ ratio_c ┆ ratio_d │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪═════════╪═════════╪═════════╪═════════╡
│ a ┆ 1.0 ┆ 10.0 ┆ inf ┆ -1.0 │
│ b ┆ 2.0 ┆ inf ┆ 200.0 ┆ -2.0 │
│ c ┆ inf ┆ 20.0 ┆ 300.0 ┆ inf │
└───────┴─────────┴─────────┴─────────┴─────────┘
To your expression, just append .name.keep()
.
df.with_columns(
pl.when(pl.col(r"^ratio_\w+$").is_infinite())
.then(None)
.otherwise(pl.col(r"^ratio_\w+$"))
.name.keep()
)
shape: (3, 5)
┌───────┬─────────┬─────────┬─────────┬─────────┐
│ label ┆ ratio_a ┆ ratio_b ┆ ratio_c ┆ ratio_d │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪═════════╪═════════╪═════════╪═════════╡
│ a ┆ 1.0 ┆ 10.0 ┆ null ┆ -1.0 │
│ b ┆ 2.0 ┆ null ┆ 200.0 ┆ -2.0 │
│ c ┆ null ┆ 20.0 ┆ 300.0 ┆ null │
└───────┴─────────┴─────────┴─────────┴─────────┘
As an aside, you should probably prepend r
to your regex string: r"^ratio_\w+$"
to avoid a deprecation warning in Python.
Upvotes: 8