Reputation: 6860

Memory leak in pandas when dropping dataframe column?

I have some code like the following

df = ..... # load a very large dataframe
good_columns = set(['a','b',........]) # set of "good" columns we want to keep
columns = list(df.columns.values)
for col in columns:
   if col not in good_columns:
      df = df.drop(col, 1)

The odd thing is that it successfully drops the first column that is not good - so it isn't an issue where I am holding the old and new dataframe in memory at the same time and running out of space. It breaks on the second column being dropped (MemoryError). This makes me suspect there is some kind of memory leak. How would I prevent this error from happening?

Upvotes: 1

Answers (3)

Wish I Knew this stuff

Reputation: 177

I tried the inplace=True argument but still had the same issues. Here's another solution dealing with the memory leak due to your architecture. That helped me when I had this same issue

Upvotes: 0

Mostafa Mahmoud

Reputation: 568

Make use of usecols argument while reading the large data frame to keep the columns you want instead of dropping them later on. Check here : http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

Upvotes: 1

kennes

Reputation: 2145

It may be that your constantly returning a new and very large dataframe. Try setting the drop inplace parameter to True.

Upvotes: 1

Memory leak in pandas when dropping dataframe column?

Answers (3)

Related Questions