robertdj
robertdj

Reputation: 1117

Select datetime columns in Polars

In Polars we can select columns by their type -- see e.g. How to select columns by data type in Polars?

However, I cannot figure out how to select datetime columns. From Polars' tutorial we have this example:

import polars as pl
import datetime at dt

df_trades = pl.DataFrame(
    {
        "time": [
            dt.datetime(2020, 1, 1, 9, 1, 0),
            dt.datetime(2020, 1, 1, 9, 1, 0),
            dt.datetime(2020, 1, 1, 9, 3, 0),
            dt.datetime(2020, 1, 1, 9, 6, 0),
        ],
        "stock": ["A", "B", "B", "C"],
        "trade": [101, 299, 301, 500],
    }
)

The column types are printed like this:

┌─────────────────────┬───────┬───────┐
│ time                ┆ stock ┆ trade │
│ ---                 ┆ ---   ┆ ---   │
│ datetime[μs]        ┆ str   ┆ i64   │

For the stock column we can use

df_trades.select(pl.col(pl.String))

But what is the type for the time column? Using dt.datetime instead of pl.String gives an error:

TypeError: argument 'name': 'type' object cannot be converted to 'PyString'

Upvotes: 2

Views: 1243

Answers (2)

ritchie46
ritchie46

Reputation: 14670

You can select polars columns by datatypes as you did with pl.String, but you have to pass a polars datatype to make that work, not a python datetime.datetime value/object.

Below we select all datetime datatypes that have "us" resolution by passing the pl.Datetime("us") datatype.

import polars as pl
from datetime import datetime

df_trades = pl.DataFrame(
    {
        "time1": [
            datetime(2020, 1, 1, 9, 1, 0),
            datetime(2020, 1, 1, 9, 1, 0),
            datetime(2020, 1, 1, 9, 3, 0),
            datetime(2020, 1, 1, 9, 6, 0),
        ],
        "time2": [
            datetime(2020, 1, 1, 9, 1, 0),
            datetime(2020, 1, 1, 9, 1, 0),
            datetime(2020, 1, 1, 9, 3, 0),
            datetime(2020, 1, 1, 9, 6, 0),
        ],
        "stock": ["A", "B", "B", "C"],
        "trade": [101, 299, 301, 500],
    }
)

df_trades.select(
    pl.col(pl.Datetime("us"))
)
shape: (4, 2)
┌─────────────────────┬─────────────────────┐
│ time1               ┆ time2               │
│ ---                 ┆ ---                 │
│ datetime[μs]        ┆ datetime[μs]        │
╞═════════════════════╪═════════════════════╡
│ 2020-01-01 09:01:00 ┆ 2020-01-01 09:01:00 │
│ 2020-01-01 09:01:00 ┆ 2020-01-01 09:01:00 │
│ 2020-01-01 09:03:00 ┆ 2020-01-01 09:03:00 │
│ 2020-01-01 09:06:00 ┆ 2020-01-01 09:06:00 │
└─────────────────────┴─────────────────────┘

Upvotes: 1

user459872
user459872

Reputation: 24582

Starting from Polars 0.18.1 you can use polars.selectors.datetime selector to select only datetime columns.

>>> import polars as pl
>>> import polars.selectors as cs
>>> import datetime as dt
>>> 
>>> df_trades = pl.DataFrame(
...     {
...         "time": [
...             dt.datetime(2020, 1, 1, 9, 1, 0),
...             dt.datetime(2020, 1, 1, 9, 1, 0),
...             dt.datetime(2020, 1, 1, 9, 3, 0),
...             dt.datetime(2020, 1, 1, 9, 6, 0),
...         ],
...         "stock": ["A", "B", "B", "C"],
...         "trade": [101, 299, 301, 500],
...     }
... )
>>> 
>>> print(df_trades.select(cs.datetime()))
shape: (4, 1)
┌─────────────────────┐
│ time                │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2020-01-01 09:01:00 │
│ 2020-01-01 09:01:00 │
│ 2020-01-01 09:03:00 │
│ 2020-01-01 09:06:00 │
└─────────────────────┘

Upvotes: 1

Related Questions