exlo
exlo

Reputation: 325

'For loop' isn't creating modified copies of existing dataframes

I would like to create modified copies of dataframes (copies with only certain columns) with new names (in this example df1_m, df2_m, etc.) that I have loaded in my Jupyter notebook. But for some reason, the following code is producing empty tables when I try to print the copy dfs with new names. When I try to do this with code inside the loop on its own, it works. So placing this code in a for loop is causing come issue for some reason. What could be going wrong?

This is the 'for loop' code that doesn't work/produces empty dfs:

parameter_cols = [col1, col2, col3]

df1_m = pd.DataFrame()
df2_m = pd.DataFrame()
df3_m = pd.DataFrame()
df4_m = pd.DataFrame()
df5_m = pd.DataFrame()
df6_m = pd.DataFrame()

df_list = [df1, df2, df3, df4, df5, df6]
df_m_list = [df1_m, df2_m, df3_m, df4_m, df5_m, df6_m]
year_list = [2015, 2016, 2017, 2018, 2019, 2020]

for df, df_m, yr in zip(df_list, df_m_list, year_list):
    df_m = df[parameter_cols]
    df_m = df_m.assign(year = yr)

However the same code outside the loop works and produces a the desired copy df (df1_m):

df1_m = df1[parameter_cols]
df1_m = df1_m.assign(year=2015)

Why is this?

Upvotes: 0

Views: 144

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

Your setups seem a bit overkill. You can do forget about df1_m,.... Since df.assign creates a copy by default, you can just do:

parameter_cols = [col1, col2, col3]
df_list = [df1, df2, df3, df4, df5, df6]
year_list = [2015, 2016, 2017, 2018, 2019, 2020]

df_m_list = [d[parameter_cols].assign(year=y) 
                for d, y in zip(df_list, year_list)
            ]

Upvotes: 1

Marat
Marat

Reputation: 15738

df_m = df[parameter_cols] simply changes the local pointer, not the referenced variable. This would do the thing:

for i, yr in enumerate(year_list):
    df_m = df_list[i][parameter_cols]
    df_m_list[i] = df_m.assign(year = yr)

Upvotes: 2

Related Questions