Reputation: 355
If I reference a name to a list, I know that that last line of code changes both values in a and b:
a = [1,2,3]
b = a
b[1] = 4
Hence (one of) the right way to do it is to use: b = a[:]
. In this way, changing values of b will NOT affect values of a.
However, the same doesn't seem to be true for pandas series or dataframes:
a = pd.DataFrame({1: [2,3,4], 2: [3,4,5]})
b = a[:]
b.loc[2,2] = 10
The last line of code will change both b and a.
Can someone explain to me why is there a difference here? Also, what is the right way to create a new series/dataframe without affecting the original series/dataframe then? Should I ALWAYS use b = a.copy(deep=True)
?
Upvotes: 2
Views: 3344
Reputation: 402813
a[:]
creates a shallow copy. With a shallow copy, the underlying data and indices are borrowed from the original - for performance reasons, the underlying numpy array data is the same. That's why the deep=True
switch is turned on by default when you use a.copy()
- you don't have to worry about modifying the original, since the underlying data is also replicated. With a[:]
, it is assumed you know what you're doing.
Upvotes: 5