Reputation: 39
I've built two Pandas dataframes like this:
import panda as pd
d = {'FIPS' : pd.Series(['01001', '01002']), 'count' : pd.Series([3, 4])}
df1 = pd.DataFrame(d)
df2 = df1
I want to change one of the values in df2. This is what I've tried:
df2.loc[df2['FIPS'] == '01001','FIPS'] = '01003'
This line appears to update both df1 and df2, but I don't understand why.
Upvotes: 0
Views: 487
Reputation: 1593
Because df2
is only a reference of df1
. They point to the same object in the memory, only by different name. If you do df2=df1.copy()
it should create a new memory for df2
and only update it..plus you have a typo in import pandas :)
You can check what memory address the object is located in with id(df1)
and see its same as df2
and changes if you use the .copy()
method
Welcome to SO!
Upvotes: 1
Reputation: 1114
Instead of df2 = df1
, say df2 = df1.copy()
.
The issue is that variables in python act like "pointers" when you assign them complex data structures. They store references to their values, rather than the actual values. So in your code above, df2 becomes another name or alias for df1. Hence the unexpected change.
Upvotes: 0