Data Mastery
Data Mastery

Reputation: 2085

Applying a custom function to pandas Series produces AttributeError

I want to create a custom summary function for pandas Series.

df["tmk"].min()
df["tmk"].max()

this works.

def min_max(x):
    minimum = x.min()
    maximum = x.max()
    print(f'Min: {minimum} | Max: {maximum}')

df["tmk"].apply(lambda x: min_max(x))

AttributeError: 'float' object has no attribute 'min'

I guess I am doing a mistake here. Can anyone help me with how to apply the function correctly?

Upvotes: 2

Views: 421

Answers (3)

jezrael
jezrael

Reputation: 862671

If use Series.apply it loop by each value of column. Error means there is no min and max for scalars.

df = pd.DataFrame({
         'tmk':[4,5,4,5,5,np.nan],
})


def min_max(x):
    minimum = x.min()
    maximum = x.max()
    print(f'Min: {minimum} | Max: {maximum}')

You need processing all values of column by Series.pipe:

df["tmk"].pipe(min_max)

Or pass Series to function like mentioned @AkshayNevrekar in comments:

min_max(df["tmk"])

Another idea is use DataFrame.apply - added [] for one column DataFrame:

df[["tmk"]].apply(min_max)

Min: 4.0 | Max: 5.0

Another method is use Series.describe or Series.agg:

print (df['tmk'].describe())

count    5.000000
mean     4.600000
std      0.547723
min      4.000000
25%      4.000000
50%      5.000000
75%      5.000000
max      5.000000
Name: tmk, dtype: float64

print (df['tmk'].agg(['min', 'max']))
min    4.0
max    5.0
Name: tmk, dtype: float64

Also is possible add format like mentioned @Jon Clements, thank you:

print ('Min: {min} | Max: {max}'.format_map(df['tmk'].agg(['min', 'max'])))

Min: 4.0 | Max: 5.0

Upvotes: 3

QtRoS
QtRoS

Reputation: 1177

For such kind of analysis just use describe method of series.

If you want some exlanation of your mistake here it is. Doing this:

df["tmk"].apply(lambda x: min_max(x))

you are applying your function to every value in your series. That value has type 'float'. Floats in python doesn't have method max or min. Instead you can use:

df["tml"].min()

or maybe built-in python min/max like:

min(df["tml"])

Upvotes: 1

PyRsquared
PyRsquared

Reputation: 7338

If you just want to get Min and Max, it may be easier for you to use the describe() method

import pandas as pd
import numpy as np

# fix seeds so we get the same numbers
np.random.seed(42)
a = np.random.normal(0, 1, 10)
np.random.seed(42)
b = np.random.uniform(0, 1, 10)

df = pd.DataFrame({"A": a, "B": b})
df.describe()

>>>            A          B
count  10.000000  10.000000
mean    0.448061   0.520137
std     0.723008   0.315866
min    -0.469474   0.058084
25%    -0.210169   0.210649
50%     0.519637   0.599887
75%     0.737498   0.726014
max     1.579213   0.950714

You can get the min, max and other metrics from there

Upvotes: 0

Related Questions