gustavz
gustavz

Reputation: 3160

Truly deep copying Pandas DataFrames

I came across something i find very odd: Apparently it is not possible, to truly deep copy pandas dataframes.

I would expect, that if i create a deep copy of a dataframe, and i modify data in this copy, it has not effect on the original dataframe. But apparently this is not the case, or even possible if i'm not wrong.

Code to reproduce:

import pandas as pd

df = pd.DataFrame({'sets':set([1,2])}, index=[0])

def pop(df_in):
    df = df_in.copy()
    print(df['sets'].apply(lambda x: set([x.pop()])))

pop(df)
pop(df)
pop(df)

>>> KeyError: 'pop from an empty set'

or

import copy
import pandas as pd

df = pd.DataFrame({'sets':set([1,2])}, index=[0])

def pop(df_in):
    df = copy.deepcopy(df_in)
    print(df['sets'].apply(lambda x: set([x.pop()])))

pop(df)
pop(df)
pop(df)

>>> KeyError: 'pop from an empty set'

My questions are:

  1. Is it possible to create true deep copies of pandas dataframes?
  2. If not why? if yes, how?

Upvotes: 6

Views: 2398

Answers (2)

ALollz
ALollz

Reputation: 59579

The problem is that your objects are mutable as they are sets. The documents explicitly call out this behavior with a warning (emphasis my own):

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object.

So as always with references to mutable objects, if you change it it affects it everywhere. We can see for ourselves despite the deep copy the objects have the same ID.

import pandas as pd
df = pd.DataFrame({'sets': [{1,2}]}, index=[0])
df1 = df.copy(deep=True)

id(df['sets'].iloc[0])
#4592957024

id1(df['sets'].iloc[0])
#4592957024

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150785

One way is to convert df_in to Python dictionary which works better with copy:

def pop(df_in):
    df = pd.DataFrame(copy.deepcopy(df_in.to_dict()) )
    print(df['sets'].apply(lambda x: set([x.pop()])))

for i in range(3): pop(df)

Output:

0    {1}
Name: sets, dtype: object
0    {1}
Name: sets, dtype: object
0    {1}
Name: sets, dtype: object

Upvotes: 2

Related Questions