hjyoo99
hjyoo99

Reputation: 53

Adding a new column to a df is not carrying over to a shallow copy of the df

I have a very simple df, which I'm going to make a shallow copy of.

# assign dataframe
old_df = pd.DataFrame({'values': [10, 20, 30, 40]})

# shallow copy
copy_df = old_df.copy(deep=False)

I understand that in a shallow copy, the changes made to one should carry over to the original. So if I create a new column (change) to the copy_df, I'd expect the change to be made to the old_df as well.

I tried creating a new column in two methods.

# method 1
copy_df.loc[:, 'new_col'] = [0, 0, 0, 0]
# method 2 
copy_df['new_col'] = [0, 0, 0, 0]

My expected result is as follows:

>>> old_df
   values  new_col
0      10        0
1      20        0
2      30        0
3      40        0

But what I get, from both methods, is the original, unchanged df:

>>> old_df
   values
0      10
1      20
2      30
3      40

I would like to ask why the change I made to the shallow copy is not carrying over to the original.

Upvotes: 0

Views: 65

Answers (2)

Corralien
Corralien

Reputation: 120479

I understand that in a shallow copy, the changes made to one should carry over to the original. So if I create a new column (change) to the copy_df, I'd expect the change to be made to the old_df as well.

Yes, this is true for all existing columns (Series) before copying but if you create new columns, they will be added only on the current DataFrame because both share the reference to the existing columns.

# Create new column
copy_df.loc[:, 'new_col'] = [0, 0, 0, 0]  # or copy_df['new_col'] = [0, 0, 0, 0]
print(old_df)

# Output
   values
0      10
1      20
2      30
3      40
# Modify existing column
copy_df.loc[[1, 2], 'values'] = 0
print(old_df)

# Output
   values
0      10
1       0
2       0
3      40

Upvotes: 1

HL03
HL03

Reputation: 178

This is the expected behaviour now after pandas v1.4: https://github.com/pandas-dev/pandas/issues/47703

Upvotes: 1

Related Questions