Reputation: 598
I have a pandas dataframe that I would like to make a duplicate of and do some operations on the duplicated version without affecting the original one. I use ".copy()" method but for some reason it doesn't work! Here is my code:
import pandas as pd
import numpy as np
x = np.array([1,2])
df = pd.DataFrame({'A': [x, x, x], 'B': [4, 5, 6]})
duplicate = df.copy()
duplicate['A'].values[0][[0,1]] = 0
print(duplicate)
print(df)
A B
0 [0, 0] 4
1 [0, 0] 5
2 [0, 0] 6
A B
0 [0, 0] 4
1 [0, 0] 5
2 [0, 0] 6
As you can see "df" (the original dataset) gets affected as well. Does anyone know why, and how this should be done correctly?
Upvotes: 0
Views: 490
Reputation: 9018
The problem is actually in the list value rather than the df itself. When you are copying the dataframe, even if it's by default a deep copy, it's not doing deepcopy on the value itself, so if the value is a list, the reference is copied over, you can tell this by the fact that even though you only tried to modify the first row, but all values of A
in your duplicate are modified.
The proper way is probably:
import pandas as pd
import numpy as np
from copy import deepcopy # <- **
x = np.array([1,2])
df = pd.DataFrame({'A': [x, x, x], 'B': [4, 5, 6]})
duplicate = df.copy()
duplicate['A'] = duplicate["A"].apply(deepcopy) # <- **
duplicate['A'].values[0][[0,1]] = 0
print(duplicate)
print(df)
A B
0 [0, 0] 4
1 [1, 2] 5
2 [1, 2] 6
A B
0 [1, 2] 4
1 [1, 2] 5
2 [1, 2] 6
Upvotes: 3