Pandas Series Apply Method

Question

I've been using pandas apply method for both series and dataframe, but I am obviously still missing something, because I'm stumped on a simple function i'm trying to execute.

This is what I was doing:

def minmax(row):
    return (row - row.min())/(row.max() - row.min())

row.apply(minmax)

but, this returns an all zero Series. For example, if

row = pd.Series([0, 1, 2])

then

minmax(row)

returns [0.0, 0.5, 1.0], as desired. But, row.apply(minmax) returns [0,0,0].

I believe this is because the series is of ints and the integer division returns 0. However, I don't understand,

why it works with minmax(row) (shouldn't it act the same?), and
how to cast it correctly in the apply function to return appropriate float values (i've tried to cast it using .astype and this gives me all NaNs... which I also don't understand)
if apply this to a dataframe, as df.apply(minmax) it also works as desired. (edit added)

i suspect i'm missing something fundamental in how the apply works... or being dense. either way, thanks in advance.

Romain · Accepted Answer

When you call row.apply(minmax) on a Series only the values are passed to the function. This is called element-wise.

Invoke function on values of Series. Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

When you call row.apply(minmax) on a DataFrame either rows (default) or columns are passed to the function (according to the value of axis).

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty. This is called row-wise or column-wise.

This is why your example works as expected on the DataFrame and not on the Series. Check this answer for information on mapping functions to Series.

Pandas Series Apply Method

Answers (1)

Related Questions