Dropping column in dataframe with assignment not workng in a loop

Question

I have two dataframes (df_train and df_test) containing a column ('Date') that I want to drop.

As far as I understood, I could do it in two ways, i.e. either by using inplace or by assigning the dataframe to itself, like:

if 'Date' in df_train.columns:
    df_train.drop(['Date'], axis=1, inplace=True)

OR

if 'Date' in df_train.columns:
    df_train = df_train.drop(['Date'], axis=1)

Both the methods work on the single dataframe, but the former way should be more memory friendly, since with the assignent a copy of the dataframe is created.

The weird thing is, I have to do it for both the dataframes, so I tried to do the same within a loop:

for data in [df_train, df_test]:
    if 'Date' in data.columns:
        data.drop(['Date'], axis=1, inplace=True)

and

for data in [df_train, df_test]:
    if 'Date' in data.columns:
        data = data.drop(['Date'], axis=1)

and the weird thing is that, in this case, only the first ways (using inplace) works. If I use the second way, the 'Date' columns aren't dropped. Why is that?

U13-Forward · Accepted Answer

It doesn't work because iterating through the list and changing what's in the list doesn't actually change the actual list of dataframes because it only changes the iterators, so you should try:

lst = []
for data in [df_train, df_test]:
    if 'Date' in data.columns:
        lst.append(data.drop(['Date'], axis=1))
print(lst)

Now lst contains all the dataframes.

Dropping column in dataframe with assignment not workng in a loop

Answers (2)

Related Questions