Carlo
Carlo

Reputation: 1576

Dropping column in dataframe with assignment not workng in a loop

I have two dataframes (df_train and df_test) containing a column ('Date') that I want to drop.

As far as I understood, I could do it in two ways, i.e. either by using inplace or by assigning the dataframe to itself, like:

if 'Date' in df_train.columns:
    df_train.drop(['Date'], axis=1, inplace=True)

OR

if 'Date' in df_train.columns:
    df_train = df_train.drop(['Date'], axis=1)

Both the methods work on the single dataframe, but the former way should be more memory friendly, since with the assignent a copy of the dataframe is created.

The weird thing is, I have to do it for both the dataframes, so I tried to do the same within a loop:

for data in [df_train, df_test]:
    if 'Date' in data.columns:
        data.drop(['Date'], axis=1, inplace=True)

and

for data in [df_train, df_test]:
    if 'Date' in data.columns:
        data = data.drop(['Date'], axis=1)

and the weird thing is that, in this case, only the first ways (using inplace) works. If I use the second way, the 'Date' columns aren't dropped. Why is that?

Upvotes: 1

Views: 88

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34086

Its better to use a list comprehension:

res = [data.drop(['Date'], axis=1) for data in [df_train, df_test] if 'Date' in data.columns]

Here, you will get a copy of both dataframes after columns are dropped.

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71610

It doesn't work because iterating through the list and changing what's in the list doesn't actually change the actual list of dataframes because it only changes the iterators, so you should try:

lst = []
for data in [df_train, df_test]:
    if 'Date' in data.columns:
        lst.append(data.drop(['Date'], axis=1))
print(lst)

Now lst contains all the dataframes.

Upvotes: 2

Related Questions