copy pandas dataframe and series

Question

If I reference a name to a list, I know that that last line of code changes both values in a and b:

a = [1,2,3]
b = a
b[1] = 4

Hence (one of) the right way to do it is to use: b = a[:]. In this way, changing values of b will NOT affect values of a.

However, the same doesn't seem to be true for pandas series or dataframes:

a = pd.DataFrame({1: [2,3,4], 2: [3,4,5]})
b = a[:]
b.loc[2,2] = 10

The last line of code will change both b and a. Can someone explain to me why is there a difference here? Also, what is the right way to create a new series/dataframe without affecting the original series/dataframe then? Should I ALWAYS use b = a.copy(deep=True)?

cs95 · Accepted Answer

a[:] creates a shallow copy. With a shallow copy, the underlying data and indices are borrowed from the original - for performance reasons, the underlying numpy array data is the same. That's why the deep=True switch is turned on by default when you use a.copy() - you don't have to worry about modifying the original, since the underlying data is also replicated. With a[:], it is assumed you know what you're doing.

copy pandas dataframe and series

Answers (1)

Related Questions