Reputation: 1484
My question is basically does dataframe.copy() use copy-on-write?
I am guessing (and I am probably wrong) that when someone calls dataframe.copy(), it is calling malloc somewhere to allocate virtual memory for the new dataframe. I believe malloc doesn't initialize the virtual memory, so it is copy-on-write and no physical data movement happens. This implies there is no real copy of the dataframe when copy() is called.
However, calling dataframe.copy() does take time and increase my memory footprint. So it looks like it is indeed making a physical copy of the data. Where am I wrong in my reasoning?
Upvotes: 2
Views: 1790
Reputation: 4990
With default deep=True
it's definitely allocating new memory, but it's also copying the data there right away. So memory footprint will increase immediately. It doesn't do copy-on-write for either .copy(deep=True)
or .copy(deep=False)
. When you do deep=False
both DataFrames will be using same data, when you do deep=True
data is copied. Malloc not initializing new memory only would be relevant here if you allocated memory and never put anything there.
Columns are just Series objects backed by numpy ndarrays. And they don't do copy-on-write (see NumPy Array Copy-On-Write).
Upvotes: 2