Kaustubh Ursekar
Kaustubh Ursekar

Reputation: 185

Understanding variable declaration in the for loop statement of python

I was surfing some code on the internet for creating dummies to my date column, which has only three values: 1800, 1900, 2000

The 'yr' is inside the function during its defining stage and has not been declared earlier. The 'yr' seems to occur in 'for loop' and 'apply' is used afterwards to get dummies. I understand that the 'yr' list in the for loop actually generates three columns of 1800, 1900, 2000 in 'movies' dataframe.

But then does;

1.) python allow declaring a list 'yr' in for loop without its previous initialization?

2.) and how come the column 'date' of 'movies' df is passed to the function without passing 'yr' as i am not able to comprehend what the 'if' statement inside the function is comparing each value of column 'date' with?

I am unable to comprehend the flow of code here for 'yr' from for loop to inside the function where 'date' column value 'val' gets compared in 'if' statement.

Please help !!

# Return century of movie as a dummy column
def add_movie_year(val):
    if val[:2] == yr:
        return 1
    else:
        return 0

# Apply function
for yr in ['18', '19', '20']:
    movies[str(yr) + "00's"] = movies['date'].apply(add_movie_year)

Upvotes: 1

Views: 97

Answers (2)

mooglinux
mooglinux

Reputation: 845

yr can be used in the function body because by the time the function is actually invoked, yr has been initialized and so the function successfully manages to look it up. Functions are able to use variables outside their scope (this is necessary to be able to use imports), but it's generally bad practice to do so.

Upvotes: 1

Shenan
Shenan

Reputation: 338

The reason you are having this problem is you should put yr in your add_movie_year function and tell apply function to use the yr as a function input.

movies = pd.DataFrame({'date':['1800', '1900', '2000']})
# Return century of movie as a dummy column
def add_movie_year(val, yr):
    if val[:2] == yr:
        return 1
    else:
        return 0

# Apply function
for yr in ['18', '19', '20']:
    movies[str(yr) + "00's"] = movies['date'].apply(add_movie_year, args = (yr,))

Upvotes: 1

Related Questions