Reputation: 2085
I want to create a custom summary function for pandas Series.
df["tmk"].min()
df["tmk"].max()
this works.
def min_max(x):
minimum = x.min()
maximum = x.max()
print(f'Min: {minimum} | Max: {maximum}')
df["tmk"].apply(lambda x: min_max(x))
AttributeError: 'float' object has no attribute 'min'
I guess I am doing a mistake here. Can anyone help me with how to apply the function correctly?
Upvotes: 2
Views: 421
Reputation: 862671
If use Series.apply
it loop by each value of column. Error means there is no min
and max
for scalars.
df = pd.DataFrame({
'tmk':[4,5,4,5,5,np.nan],
})
def min_max(x):
minimum = x.min()
maximum = x.max()
print(f'Min: {minimum} | Max: {maximum}')
You need processing all values of column by Series.pipe
:
df["tmk"].pipe(min_max)
Or pass Series to function like mentioned @AkshayNevrekar in comments:
min_max(df["tmk"])
Another idea is use DataFrame.apply
- added []
for one column DataFrame
:
df[["tmk"]].apply(min_max)
Min: 4.0 | Max: 5.0
Another method is use Series.describe
or Series.agg
:
print (df['tmk'].describe())
count 5.000000
mean 4.600000
std 0.547723
min 4.000000
25% 4.000000
50% 5.000000
75% 5.000000
max 5.000000
Name: tmk, dtype: float64
print (df['tmk'].agg(['min', 'max']))
min 4.0
max 5.0
Name: tmk, dtype: float64
Also is possible add format
like mentioned @Jon Clements, thank you:
print ('Min: {min} | Max: {max}'.format_map(df['tmk'].agg(['min', 'max'])))
Min: 4.0 | Max: 5.0
Upvotes: 3
Reputation: 1177
For such kind of analysis just use describe method of series.
If you want some exlanation of your mistake here it is. Doing this:
df["tmk"].apply(lambda x: min_max(x))
you are applying your function to every value in your series. That value has type 'float'. Floats in python doesn't have method max or min. Instead you can use:
df["tml"].min()
or maybe built-in python min/max like:
min(df["tml"])
Upvotes: 1
Reputation: 7338
If you just want to get Min and Max, it may be easier for you to use the describe()
method
import pandas as pd
import numpy as np
# fix seeds so we get the same numbers
np.random.seed(42)
a = np.random.normal(0, 1, 10)
np.random.seed(42)
b = np.random.uniform(0, 1, 10)
df = pd.DataFrame({"A": a, "B": b})
df.describe()
>>> A B
count 10.000000 10.000000
mean 0.448061 0.520137
std 0.723008 0.315866
min -0.469474 0.058084
25% -0.210169 0.210649
50% 0.519637 0.599887
75% 0.737498 0.726014
max 1.579213 0.950714
You can get the min, max and other metrics from there
Upvotes: 0