Reputation: 1123
I have been using polars but it seems like it lacks qcut functionality as pandas do.
I am not sure about the reason but is it possible to achieve the same effect as pandas qcut using current available polars functionalities?
The following shows an example about what I can do with pandas qcut.
import pandas as pd
data = pd.Series([11, 1, 2, 2, 3, 4, 5, 1, 2, 3, 4, 5])
pd.qcut(data, [0, 0.2, 0.4, 0.6, 0.8, 1], labels=['q1', 'q2', 'q3', 'q4', 'q5'])
The results are as follows:
0 q5
1 q1
2 q1
3 q1
4 q3
5 q4
6 q5
7 q1
8 q1
9 q3
10 q4
11 q5
dtype: category
So, I am curious how can I get the same result by using polars?
Thanks for your help.
Upvotes: 6
Views: 1233
Reputation: 21534
Update:
Series.qcut
was added in polars version 0.16.15
data = pl.Series([11, 1, 2, 2, 3, 4, 5, 1, 2, 3, 4, 5])
data.qcut([0.2, 0.4, 0.6, 0.8], labels=['q1', 'q2', 'q3', 'q4', 'q5'], maintain_order=True)
shape: (12, 3)
┌──────┬─────────────┬──────────┐
│ ┆ break_point ┆ category │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ cat │
╞══════╪═════════════╪══════════╡
│ 11.0 ┆ inf ┆ q5 │
│ 1.0 ┆ 2.0 ┆ q1 │
│ 2.0 ┆ 2.0 ┆ q1 │
│ 2.0 ┆ 2.0 ┆ q1 │
│ … ┆ … ┆ … │
│ 2.0 ┆ 2.0 ┆ q1 │
│ 3.0 ┆ 3.6 ┆ q3 │
│ 4.0 ┆ 4.8 ┆ q4 │
│ 5.0 ┆ inf ┆ q5 │
└──────┴─────────────┴──────────┘
Old answer:
From what I can tell .qcut()
uses the linear quantile of the bin values?
If so, you could implement that part "manually":
import polars as pl
data = pl.Series([11, 1, 2, 2, 3, 4, 5, 1, 2, 3, 4, 5])
bins = [0.2, 0.4, 0.6, 0.8]
labels = ["q1", "q2", "q3", "q4", "q5"]
pl.cut(data, bins=[data.quantile(val, "linear") for val in bins], labels=labels)
shape: (12, 3)
┌──────┬─────────────┬──────────┐
│ | break_point | category │
│ --- | --- | --- │
│ f64 | f64 | cat │
╞══════╪═════════════╪══════════╡
│ 1.0 | 2.0 | q1 │
│ 1.0 | 2.0 | q1 │
│ 2.0 | 2.0 | q1 │
│ 2.0 | 2.0 | q1 │
│ 2.0 | 2.0 | q1 │
│ 3.0 | 3.6 | q3 │
│ 3.0 | 3.6 | q3 │
│ 4.0 | 4.8 | q4 │
│ 4.0 | 4.8 | q4 │
│ 5.0 | inf | q5 │
│ 5.0 | inf | q5 │
│ 11.0 | inf | q5 │
└──────┴─────────────┴──────────┘
Upvotes: 10