Reputation: 185
I was surfing some code on the internet for creating dummies to my date column, which has only three values: 1800, 1900, 2000
The 'yr' is inside the function during its defining stage and has not been declared earlier. The 'yr' seems to occur in 'for loop' and 'apply' is used afterwards to get dummies. I understand that the 'yr' list in the for loop actually generates three columns of 1800, 1900, 2000 in 'movies' dataframe.
But then does;
1.) python allow declaring a list 'yr' in for loop without its previous initialization?
2.) and how come the column 'date' of 'movies' df is passed to the function without passing 'yr' as i am not able to comprehend what the 'if' statement inside the function is comparing each value of column 'date' with?
I am unable to comprehend the flow of code here for 'yr' from for loop to inside the function where 'date' column value 'val' gets compared in 'if' statement.
Please help !!
# Return century of movie as a dummy column
def add_movie_year(val):
if val[:2] == yr:
return 1
else:
return 0
# Apply function
for yr in ['18', '19', '20']:
movies[str(yr) + "00's"] = movies['date'].apply(add_movie_year)
Upvotes: 1
Views: 97
Reputation: 845
yr
can be used in the function body because by the time the function is actually invoked, yr
has been initialized and so the function successfully manages to look it up. Functions are able to use variables outside their scope (this is necessary to be able to use imports), but it's generally bad practice to do so.
Upvotes: 1
Reputation: 338
The reason you are having this problem is you should put yr
in your add_movie_year
function and tell apply function to use the yr
as a function input.
movies = pd.DataFrame({'date':['1800', '1900', '2000']})
# Return century of movie as a dummy column
def add_movie_year(val, yr):
if val[:2] == yr:
return 1
else:
return 0
# Apply function
for yr in ['18', '19', '20']:
movies[str(yr) + "00's"] = movies['date'].apply(add_movie_year, args = (yr,))
Upvotes: 1