Reputation: 3160
I came across something i find very odd: Apparently it is not possible, to truly deep copy pandas dataframes.
I would expect, that if i create a deep copy of a dataframe, and i modify data in this copy, it has not effect on the original dataframe. But apparently this is not the case, or even possible if i'm not wrong.
Code to reproduce:
import pandas as pd
df = pd.DataFrame({'sets':set([1,2])}, index=[0])
def pop(df_in):
df = df_in.copy()
print(df['sets'].apply(lambda x: set([x.pop()])))
pop(df)
pop(df)
pop(df)
>>> KeyError: 'pop from an empty set'
or
import copy
import pandas as pd
df = pd.DataFrame({'sets':set([1,2])}, index=[0])
def pop(df_in):
df = copy.deepcopy(df_in)
print(df['sets'].apply(lambda x: set([x.pop()])))
pop(df)
pop(df)
pop(df)
>>> KeyError: 'pop from an empty set'
My questions are:
Upvotes: 6
Views: 2398
Reputation: 59579
The problem is that your objects are mutable as they are sets. The documents explicitly call out this behavior with a warning (emphasis my own):
When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object.
So as always with references to mutable objects, if you change it it affects it everywhere. We can see for ourselves despite the deep copy the objects have the same ID.
import pandas as pd
df = pd.DataFrame({'sets': [{1,2}]}, index=[0])
df1 = df.copy(deep=True)
id(df['sets'].iloc[0])
#4592957024
id1(df['sets'].iloc[0])
#4592957024
Upvotes: 1
Reputation: 150785
One way is to convert df_in
to Python dictionary which works better with copy
:
def pop(df_in):
df = pd.DataFrame(copy.deepcopy(df_in.to_dict()) )
print(df['sets'].apply(lambda x: set([x.pop()])))
for i in range(3): pop(df)
Output:
0 {1}
Name: sets, dtype: object
0 {1}
Name: sets, dtype: object
0 {1}
Name: sets, dtype: object
Upvotes: 2