Reputation:
In pandas we have the pandas.DataFrame.select_dtypes
method that selects certain columns depending on the dtype
. Is there a similar way to do such a thing in Polars?
Upvotes: 13
Views: 15185
Reputation: 24602
Starting from Polars 0.18.1 You can use polars.selectors.by_dtype
selector to select all columns matching the given dtypes.
>>> import polars as pl
>>> import polars.selectors as cs
>>>
>>> df = pl.DataFrame(
... {
... "id": [1, 2, 3],
... "name": ["John", "Jane", "Jake"],
... "else": [10.0, 20.0, 30.0],
... }
... )
>>>
>>> print(df.select(cs.by_dtype(pl.String, pl.Int64)))
shape: (3, 2)
┌─────┬──────┐
│ id ┆ name │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ John │
│ 2 ┆ Jane │
│ 3 ┆ Jake │
└─────┴──────┘
To select all non-numeric type columns:
>>> import polars as pl
>>> import polars.selectors as cs
>>>
>>> df = pl.DataFrame(
... {
... "id": [1, 2, 3],
... "name": ["John", "Jane", "Jake"],
... "else": [10.0, 20.0, 30.0],
... }
... )
>>>
>>> print(df.select(~cs.by_dtype(pl.NUMERIC_DTYPES)))
>>> # OR print(df.select(~cs.numeric()))
shape: (3, 1)
┌──────┐
│ name │
│ --- │
│ str │
╞══════╡
│ John │
│ Jane │
│ Jake │
└──────┘
Upvotes: 8
Reputation:
One can pass data type(s) to pl.col
:
import polars as pl
df = pl.DataFrame(
{
"id": [1, 2, 3],
"name": ["John", "Jane", "Jake"],
"else": [10.0, 20.0, 30.0],
}
)
print(df.select(pl.col(pl.String, pl.Int64)))
Output:
shape: (3, 2)
┌─────┬──────┐
│ id ┆ name │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ John │
│ 2 ┆ Jane │
│ 3 ┆ Jake │
└─────┴──────┘
Upvotes: 14
Reputation: 306
When working with groups of datatypes, like numeric dtypes, you can use polars.selectors datatype groups directly.
Groups include: categorical, date, datetime, float, integer, numeric, string, temporal and time.
# strip whitespace from all string columns
df = df.with_columns(cs.string().str.strip_chars())
# convert all numeric types to float32
df = df.with_columns(cs.numeric().cast(pl.Float32))
Upvotes: 2