Reputation: 600
Consider a Polars data frame with a column of str
type that indicates the date in the format '27 July 2020'
.
I would like to convert this column to the polars.datetime
type, which is distinct from the Python standard datetime
.
import polars as pl
from datetime import datetime
df = pl.DataFrame({
"id": [1, 2],
"event_date": ["27 July 2020", "31 December 2020"]
})
df = df.with_columns(
pl.col("event_date").map_elements(lambda x: x.replace(" ", "-"))
.map_elements(lambda x: datetime.strptime(x, "%d-%B-%Y"))
)
shape: (2, 2)
┌─────┬─────────────────────┐
│ id ┆ event_date │
│ --- ┆ --- │
│ i64 ┆ datetime[μs] │
╞═════╪═════════════════════╡
│ 1 ┆ 2020-07-27 00:00:00 │
│ 2 ┆ 2020-12-31 00:00:00 │
└─────┴─────────────────────┘
Suppose we try to process df
further to create a new column indicating the quarter of the year an event took place.
df.with_columns(
pl.col("event_date").map_elements(lambda x: x.month)
.map_elements(lambda x: 1 if x in range(1,4) else 2 if x in range(4,7) else 3 if x in range(7,10) else 4)
.alias("quarter")
)
shape: (2, 3)
┌─────┬─────────────────────┬─────────┐
│ id ┆ event_date ┆ quarter │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ i64 │
╞═════╪═════════════════════╪═════════╡
│ 1 ┆ 2020-07-27 00:00:00 ┆ 3 │
│ 2 ┆ 2020-12-31 00:00:00 ┆ 4 │
└─────┴─────────────────────┴─────────┘
How would I do this in Polars without applying custom lambdas through map_elements
?
Upvotes: 25
Views: 45533
Reputation:
The easiest way to convert strings to Date/Datetime is to use Polars' own functions:
df.with_columns(
pl.col("event_date").str.to_datetime("%d %B %Y")
)
shape: (2, 2)
┌─────┬─────────────────────┐
│ id ┆ event_date │
│ --- ┆ --- │
│ i64 ┆ datetime[μs] │
╞═════╪═════════════════════╡
│ 1 ┆ 2020-07-27 00:00:00 │
│ 2 ┆ 2020-12-31 00:00:00 │
└─────┴─────────────────────┘
The Temporal section of the docs shows the supported functions in the .dt
namespace.
In the case of your second example, there is a dedicated quarter expression:
df = df.with_columns(
pl.col("event_date").str.to_datetime("%d %B %Y")
).with_columns(
pl.col("event_date").dt.quarter().alias("quarter")
)
shape: (2, 3)
┌─────┬─────────────────────┬─────────┐
│ id ┆ event_date ┆ quarter │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ i8 │
╞═════╪═════════════════════╪═════════╡
│ 1 ┆ 2020-07-27 00:00:00 ┆ 3 │
│ 2 ┆ 2020-12-31 00:00:00 ┆ 4 │
└─────┴─────────────────────┴─────────┘
Upvotes: 45