How can I summarize all columns of a polars dataframe

Question

Pandas makes it easy to summarize columns of a dataframe with an arbitrary function using df.apply(my_func, axis=0).

How can I do the same in polars? Shown below is a MWE. I have a function (just an example, I would like to do this for arbitrary functions) that I can apply to entire columns. The function summarizes columns in pandas using the syntax I've shown.

What is the syntax to perform the same operation in polars?

import polars as pl
import pandas as pd
import numpy as np

# Toy Data
data = {'a':[1, 2, 3, 4, 5], 
        'b': [2, 4, 6, 8, 10]}

# Pandas and polars copy
df = pd.DataFrame(data)
pdf = pl.DataFrame(data)

# Function I want to use to summarize my columns
my_func = lambda x: np.log(x.mean())

# How to do this in pandas
df.apply(my_func, axis=0)

# How do I do the same in polars?

Wayoshi · Accepted Answer

You can use map_batches:

pdf.select(pl.all().map_batches(my_func))

See the User-defined functions section in the User guide for more details.

How can I summarize all columns of a polars dataframe

Answers (2)

Context

Related Questions