Reputation: 6860
I have some code like the following
df = ..... # load a very large dataframe
good_columns = set(['a','b',........]) # set of "good" columns we want to keep
columns = list(df.columns.values)
for col in columns:
if col not in good_columns:
df = df.drop(col, 1)
The odd thing is that it successfully drops the first column that is not good - so it isn't an issue where I am holding the old and new dataframe in memory at the same time and running out of space. It breaks on the second column being dropped (MemoryError). This makes me suspect there is some kind of memory leak. How would I prevent this error from happening?
Upvotes: 1
Views: 4839
Reputation: 177
I tried the inplace=True
argument but still had the same issues. Here's another solution dealing with the memory leak due to your architecture. That helped me when I had this same issue
Upvotes: 0
Reputation: 568
Make use of usecols argument while reading the large data frame to keep the columns you want instead of dropping them later on. Check here : http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html
Upvotes: 1