Reputation: 17
I have a column in a dataset which has null
(which are to be predicted), and some other ones.
I wanted to create an is_null
column which says whether the first column's values were null or not (element-wise).
I came across .map_elements
method, but it "skipped" the null values. Here's the example:
import polars as pl
df = pl.DataFrame({"foo": [1, None, 3], "bar": [-1, None, 8]})
# shape: (3, 2)
# ┌──────┬──────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞══════╪══════╡
# │ 1 ┆ -1 │
# │ null ┆ null │
# │ 3 ┆ 8 │
# └──────┴──────┘
def print_and_fill(value):
print("Value is", value)
return 1
df["foo"].map_elements(print_and_fill)
## Output ##
# Value is 1
# Value is 3
# shape: (3,)
# Series: 'bar' [i64]
# [
# 1
# null
# 1
# ]
Clearly, the null value was skipped. Is there any way to apply the function to all values?
I came across a workaround: We can temporarily .fill_null()
and then call .map_elements()
, but this is clearly not the best solution.
Upvotes: 2
Views: 1531
Reputation:
map_elements
has a skip_nulls=
parameter which defaults to True
.
In general, it's best to avoid using map_elements
unless absolutely necessary.
I wanted to create an is_null column which says whether the first column's values were null or not (element-wise).
One easy way is to use the is_null
expression. For example:
(
df
.with_columns(
pl.col('foo').is_null().alias('foo_is_null')
)
)
shape: (3, 3)
┌──────┬──────┬─────────────┐
│ foo ┆ bar ┆ foo_is_null │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ bool │
╞══════╪══════╪═════════════╡
│ 1 ┆ -1 ┆ false │
│ null ┆ null ┆ true │
│ 3 ┆ 8 ┆ false │
└──────┴──────┴─────────────┘
Upvotes: 2