Reputation: 1576
I have two dataframes (df_train and df_test) containing a column ('Date') that I want to drop.
As far as I understood, I could do it in two ways, i.e. either by using inplace or by assigning the dataframe to itself, like:
if 'Date' in df_train.columns:
df_train.drop(['Date'], axis=1, inplace=True)
OR
if 'Date' in df_train.columns:
df_train = df_train.drop(['Date'], axis=1)
Both the methods work on the single dataframe, but the former way should be more memory friendly, since with the assignent a copy of the dataframe is created.
The weird thing is, I have to do it for both the dataframes, so I tried to do the same within a loop:
for data in [df_train, df_test]:
if 'Date' in data.columns:
data.drop(['Date'], axis=1, inplace=True)
and
for data in [df_train, df_test]:
if 'Date' in data.columns:
data = data.drop(['Date'], axis=1)
and the weird thing is that, in this case, only the first ways (using inplace) works. If I use the second way, the 'Date' columns aren't dropped. Why is that?
Upvotes: 1
Views: 88
Reputation: 34086
Its better to use a list comprehension
:
res = [data.drop(['Date'], axis=1) for data in [df_train, df_test] if 'Date' in data.columns]
Here, you will get a copy of both dataframes after columns are dropped.
Upvotes: 1
Reputation: 71610
It doesn't work because iterating through the list and changing what's in the list doesn't actually change the actual list of dataframes because it only changes the iterators, so you should try:
lst = []
for data in [df_train, df_test]:
if 'Date' in data.columns:
lst.append(data.drop(['Date'], axis=1))
print(lst)
Now lst
contains all the dataframes.
Upvotes: 2