user554319
user554319

Reputation:

How to select columns by data type in Polars?

In pandas we have the pandas.DataFrame.select_dtypes method that selects certain columns depending on the dtype. Is there a similar way to do such a thing in Polars?

Upvotes: 13

Views: 15185

Answers (3)

user459872
user459872

Reputation: 24602

Starting from Polars 0.18.1 You can use polars.selectors.by_dtype selector to select all columns matching the given dtypes.

>>> import polars as pl
>>> import polars.selectors as cs
>>> 
>>> df = pl.DataFrame(
...     {
...         "id": [1, 2, 3],
...         "name": ["John", "Jane", "Jake"],
...         "else": [10.0, 20.0, 30.0],
...     }
... )
>>> 
>>> print(df.select(cs.by_dtype(pl.String, pl.Int64)))
shape: (3, 2)
┌─────┬──────┐
│ id  ┆ name │
│ --- ┆ ---  │
│ i64 ┆ str  │
╞═════╪══════╡
│ 1   ┆ John │
│ 2   ┆ Jane │
│ 3   ┆ Jake │
└─────┴──────┘

To select all non-numeric type columns:

>>> import polars as pl
>>> import polars.selectors as cs
>>> 
>>> df = pl.DataFrame(
...     {
...         "id": [1, 2, 3],
...         "name": ["John", "Jane", "Jake"],
...         "else": [10.0, 20.0, 30.0],
...     }
... )
>>> 
>>> print(df.select(~cs.by_dtype(pl.NUMERIC_DTYPES)))
>>> # OR print(df.select(~cs.numeric()))
shape: (3, 1)
┌──────┐
│ name │
│ ---  │
│ str  │
╞══════╡
│ John │
│ Jane │
│ Jake │
└──────┘

Upvotes: 8

user554319
user554319

Reputation:

One can pass data type(s) to pl.col:

import polars as pl

df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "name": ["John", "Jane", "Jake"],
        "else": [10.0, 20.0, 30.0],
    }
)
print(df.select(pl.col(pl.String, pl.Int64)))

Output:

shape: (3, 2)
┌─────┬──────┐
│ id  ┆ name │
│ --- ┆ ---  │
│ i64 ┆ str  │
╞═════╪══════╡
│ 1   ┆ John │
│ 2   ┆ Jane │
│ 3   ┆ Jake │
└─────┴──────┘

Upvotes: 14

code_stable
code_stable

Reputation: 306

When working with groups of datatypes, like numeric dtypes, you can use polars.selectors datatype groups directly.

Groups include: categorical, date, datetime, float, integer, numeric, string, temporal and time.

# strip whitespace from all string columns
df = df.with_columns(cs.string().str.strip_chars())

# convert all numeric types to float32
df = df.with_columns(cs.numeric().cast(pl.Float32))

Upvotes: 2

Related Questions