user1559897
user1559897

Reputation: 1484

What does Pandas dataframe copy do?

My question is basically does dataframe.copy() use copy-on-write?

I am guessing (and I am probably wrong) that when someone calls dataframe.copy(), it is calling malloc somewhere to allocate virtual memory for the new dataframe. I believe malloc doesn't initialize the virtual memory, so it is copy-on-write and no physical data movement happens. This implies there is no real copy of the dataframe when copy() is called.

However, calling dataframe.copy() does take time and increase my memory footprint. So it looks like it is indeed making a physical copy of the data. Where am I wrong in my reasoning?

Upvotes: 2

Views: 1790

Answers (1)

Alexander Pivovarov
Alexander Pivovarov

Reputation: 4990

With default deep=True it's definitely allocating new memory, but it's also copying the data there right away. So memory footprint will increase immediately. It doesn't do copy-on-write for either .copy(deep=True) or .copy(deep=False). When you do deep=False both DataFrames will be using same data, when you do deep=True data is copied. Malloc not initializing new memory only would be relevant here if you allocated memory and never put anything there.

Columns are just Series objects backed by numpy ndarrays. And they don't do copy-on-write (see NumPy Array Copy-On-Write).

Upvotes: 2

Related Questions