Reid
Reid

Reputation: 39

Updating a value in a Pandas dataframe seems to update all dataframes

I've built two Pandas dataframes like this:

import panda as pd
d = {'FIPS' : pd.Series(['01001', '01002']), 'count' : pd.Series([3, 4])}
df1  = pd.DataFrame(d)
df2 = df1

I want to change one of the values in df2. This is what I've tried:

df2.loc[df2['FIPS'] == '01001','FIPS'] = '01003' 

This line appears to update both df1 and df2, but I don't understand why.

Upvotes: 0

Views: 487

Answers (2)

Jan Sila
Jan Sila

Reputation: 1593

Because df2 is only a reference of df1. They point to the same object in the memory, only by different name. If you do df2=df1.copy() it should create a new memory for df2 and only update it..plus you have a typo in import pandas :)

You can check what memory address the object is located in with id(df1) and see its same as df2 and changes if you use the .copy() method

Welcome to SO!

Upvotes: 1

Alex L
Alex L

Reputation: 1114

Instead of df2 = df1, say df2 = df1.copy().

The issue is that variables in python act like "pointers" when you assign them complex data structures. They store references to their values, rather than the actual values. So in your code above, df2 becomes another name or alias for df1. Hence the unexpected change.

Upvotes: 0

Related Questions