Polars filter dataframe with multilple conditions

Question

I've got this pandas code:

df['date_col'] = pd.to_datetime(df['date_col'], format='%Y-%m-%d')
row['date_col'] = pd.to_datetime(row['date_col'], format='%Y-%m-%d')

df = df[(df['groupby_col'] == row['groupby_col']) & 
        (row['date_col'] - df['date_col'] <= timedelta(days = 10)) & 
        (row['date_col'] - df['date_col'] > timedelta(days = 0))]

row['mean_col]' = df['price_col'].mean()

The name row comes from the fact that this function was applied by a lambda construct.

I'm subsetting df with 2 types of conditions:

1. A condition on values equality on a column named "groupby_col",
1. Multiple onditions on time ranges based on the "date_col" column that features timestamps.

I'm pretty sure that filter is the correct module to use:

df.filter(condition_1 & condition_2)

but i'm struggling to write the conditions. In order to embed condition 1 do i have to nest a filter condition or a when is the correct choice? How do i translate the timedelta condition? How do i replicate the lambda approach?

roman · Accepted Answer

It's a bit hard to understand your example without test data. But if I try to create some sample data

import polars as pl
import datetime

df = pl.DataFrame({
   "date_col": ['2023-01-01','2023-01-02', '2023-01-03'],
   "groupby_col": [1,2,3],
})

row = pl.DataFrame({
   "date_col": ['2023-01-07','2023-01-08', '2023-01-25'],
   "groupby_col": [1,2,3],
})

df = df.with_columns(pl.col('date_col').str.to_datetime().cast(pl.Date))
row = row.with_columns(pl.col('date_col').str.to_datetime().cast(pl.Date))

then you can filter on multiple conditions by joining two dataframes first and then filtering:

(df
   .join(row, on=["groupby_col"])
   .filter(
       pl.col("date_col_right") - pl.col("date_col") >= datetime.timedelta(days=0),
       pl.col("date_col_right") - pl.col("date_col") < datetime.timedelta(days=10),
   ).drop('date_col_right')
)

shape: (2, 2)
┌────────────┬─────────────┐
│ date_col   ┆ groupby_col │
│ ---        ┆ ---         │
│ date       ┆ i64         │
╞════════════╪═════════════╡
│ 2023-01-01 ┆ 1           │
│ 2023-01-02 ┆ 2           │
└────────────┴─────────────┘

Polars filter dataframe with multilple conditions

Answers (2)

Related Questions