Reputation: 51
I'm struggling with updating values from a dataframe with values from another dataframe using the row index as key. Dataframes are not identical in terms of number of columns so updating can only occur for matching columns. Using the code below it would mean that df3 yields the same result as df4. However df3 returns a None object.
Anyone who can put me in the right direction? It doesn't seem very complicated but I can't seem to get it right
ps. In reality the 2 dataframes are a lot larger than the ones in this example (both in terms of rows and columns)
import pandas as pd
data1 = {'A': [1, 2, 3,4],'B': [4, 5, 6,7],'C':[7,8,9,10]}
df1 = pd.DataFrame(data1,index=['I_1','I_2','I_3','I_4'])
print(df1)
data2 = {'A': [10, 40], 'B': [40, 70]}
df2 = pd.DataFrame(data2 ,index=['I_1','I_4'])
print(df2)
df3 = df1.update(df2)
print(df3)
data4 = {'A': [10, 2, 3,40],'B': [40, 5, 6,70],'C':[7,8,9,10]}
df4 = pd.DataFrame(data4 ,index=['I_1','I_2','I_3','I_4'])
print(df4)
```
Upvotes: 2
Views: 437
Reputation: 1496
pandas.DataFrame.update
returns None. The method directly changes calling object.
source: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.update.html
for your example this means two things.
df3=none
df1
got changed when df3 = df1.update(df2)
gets called. In your case df1
would look like df4
from that point on.to write df3
and leave df1
untouched this could be done:
import pandas as pd
data1 = {'A': [1, 2, 3,4],'B': [4, 5, 6,7],'C':[7,8,9,10]}
df1 = pd.DataFrame(data1,index=['I_1','I_2','I_3','I_4'])
print(df1)
data2 = {'A': [10, 40], 'B': [40, 70]}
df2 = pd.DataFrame(data2 ,index=['I_1','I_4'])
print(df2)
#using deep=False if df1 should not get affected by the update method
df3 = df1.copy(deep=False)
df3.update(df2)
print(df3)
data4 = {'A': [10, 2, 3,40],'B': [40, 5, 6,70],'C':[7,8,9,10]}
df4 = pd.DataFrame(data4 ,index=['I_1','I_2','I_3','I_4'])
print(df4)
Upvotes: 2