Reputation: 53
I have a very simple df, which I'm going to make a shallow copy of.
# assign dataframe
old_df = pd.DataFrame({'values': [10, 20, 30, 40]})
# shallow copy
copy_df = old_df.copy(deep=False)
I understand that in a shallow copy, the changes made to one should carry over to the original. So if I create a new column (change) to the copy_df, I'd expect the change to be made to the old_df as well.
I tried creating a new column in two methods.
# method 1
copy_df.loc[:, 'new_col'] = [0, 0, 0, 0]
# method 2
copy_df['new_col'] = [0, 0, 0, 0]
My expected result is as follows:
>>> old_df
values new_col
0 10 0
1 20 0
2 30 0
3 40 0
But what I get, from both methods, is the original, unchanged df:
>>> old_df
values
0 10
1 20
2 30
3 40
I would like to ask why the change I made to the shallow copy is not carrying over to the original.
Upvotes: 0
Views: 65
Reputation: 120479
I understand that in a shallow copy, the changes made to one should carry over to the original. So if I create a new column (change) to the copy_df, I'd expect the change to be made to the old_df as well.
Yes, this is true for all existing columns (Series
) before copying but if you create new columns, they will be added only on the current DataFrame
because both share the reference to the existing columns.
# Create new column
copy_df.loc[:, 'new_col'] = [0, 0, 0, 0] # or copy_df['new_col'] = [0, 0, 0, 0]
print(old_df)
# Output
values
0 10
1 20
2 30
3 40
# Modify existing column
copy_df.loc[[1, 2], 'values'] = 0
print(old_df)
# Output
values
0 10
1 0
2 0
3 40
Upvotes: 1
Reputation: 178
This is the expected behaviour now after pandas v1.4: https://github.com/pandas-dev/pandas/issues/47703
Upvotes: 1