Reputation: 1241
While working on the Intro to data analysis course on Udacity I stumbled upon the following. When writing the function below I did not realize I named the helper function standardize(series)
the same as the main function standardize(df)
. To my surprise the function worked fine.
How does the interpreter know which version of the standardize function to use when I use it in df.apply(standardize)
? Does it have anything to do with the class of the argument (series vs df)? Why doesn't it try to use recursion? Or if it does, I cannot seem to conceptualize how that works out step by step.
grades_df = pd.DataFrame(
data={'exam1': [43, 81, 78, 75, 89, 70, 91, 65, 98, 87],
'exam2': [24, 63, 56, 56, 67, 51, 79, 46, 72, 60]},
index=['Andre', 'Barry', 'Chris', 'Dan', 'Emilio',
'Fred', 'Greta', 'Humbert', 'Ivan', 'James']
)
def standardize(df):
'''
Fill in this function to standardize each column of the given
DataFrame. To standardize a variable, convert each value to the
number of standard deviations it is above or below the mean.
'''
def standardize(series):
return (series - series.mean())/series.std(ddof = 0)
return df.apply(standardize)
standardize(grades_df)
Upvotes: 2
Views: 99
Reputation: 2189
To answer "How does the interpreter know which version of the standardize function to use...", it's based on scoping rules in the language.
In most languages, names (identifiers to functions, variables, etc.) are resolved from the local scope first, then if not found, progressing to the next outer level, etc.
In this case, you have two definitions for the "standardize" function -- the first is defined in the global scope of the interpreter, the second is defined inside the scope of the first. When you call df.apply(standardize)
, the name "standardize" is resolved to the locally defined function (the second) because the local scope is searched prior to looking at outer scopes.
Upvotes: 3