Polars interpolate_by fails when Null is at beginning or end

Question

I've noticed some unexpected behavior with the interpolate_by expression and I'm not sure what is going on.

df = pl.DataFrame({
    'a': [1, 2, 3, 4, 5],
    'b': [4, 5, None, 7, 8]
})
df = df.with_columns(interpolate = pl.col('b').interpolate_by('a'))
print(df)

results in this:

┌─────┬──────┬─────────────┐
│ a   ┆ b    ┆ interpolate │
│ --- ┆ ---  ┆ ---         │
│ i64 ┆ i64  ┆ f64         │
╞═════╪══════╪═════════════╡
│ 1   ┆ 4    ┆ 4.0         │
│ 2   ┆ 5    ┆ 5.0         │
│ 3   ┆ null ┆ 6.0         │
│ 4   ┆ 7    ┆ 7.0         │
│ 5   ┆ 8    ┆ 8.0         │
└─────┴──────┴─────────────┘

which is correct. However this:

df = pl.DataFrame({
    'a': [1, 2, 3, 4, 5],
    'b': [4, 5, 6, 7, None]
})
df = df.with_columns(interpolate = pl.col('b').interpolate_by('a'))
print(df)

results in this:

shape: (5, 3)
┌─────┬──────┬─────────────┐
│ a   ┆ b    ┆ interpolate │
│ --- ┆ ---  ┆ ---         │
│ i64 ┆ i64  ┆ f64         │
╞═════╪══════╪═════════════╡
│ 1   ┆ 4    ┆ 4.0         │
│ 2   ┆ 5    ┆ 5.0         │
│ 3   ┆ 6    ┆ 6.0         │
│ 4   ┆ 7    ┆ 7.0         │
│ 5   ┆ null ┆ null        │
└─────┴──────┴─────────────┘

which is not correct. There is still plenty of data to perform a linear interpolation on column B using the data in column A. Am I missing something here and don't understand how this is supposed to work or is this a bug?

Polars interpolate_by fails when Null is at beginning or end

Answers (1)

Related Questions