Reputation: 2053
I have these two dataframes: (update: I've added one column stuff
to df1
to specify that the two dataframes have not the same schemas)
df1 = pd.DataFrame({'id': ['1','2','3'],
'val': [0, 0, 0],
'stuff': ['foo', 'bar', 'spam']})
df2 = pd.DataFrame({'id': ['2','3'], 'val': [10, 20]})
print(df1)
id val stuff
0 1 0 foo
1 2 0 bar
2 3 0 spam
print(df2)
id val
0 2 10
1 3 20
I want to update the values in df1
val
column with the values from df2
val
column based on the id
column. Desired result after transformation on df1
:
print(df1)
id val stuff
0 1 0 foo
1 2 10 bar
2 3 20 spam
I could use a join (merge
) but then I would need several more steps to end up with the expected result (casting the column from float to int, dropping column, etc.). (BTW, if you have a simple and elegant way to it with a join, I am also interested).
I am trying to use slicing methods but could not figure out how. Example:
>>> df1.loc[df1['id'].isin(df2['id']), 'val'] = df2['val']
gives:
print(df1)
id val stuff
0 1 0.0 foo
1 2 20.0 bar
2 3 NaN spam
Update: One more constraint: do not modify the original df1
index.
Upvotes: 2
Views: 2961
Reputation: 4233
You can use concat
and drop duplicates
df = pd.concat([df1,df2]).set_index('id').drop_duplicates().reset_index()
id val
0 1 0
1 2 10
2 3 20
If you don't want to modify original index then something like this will work.
df1 = pd.DataFrame({'id': ['1','2','3'], 'val': [0, 0, 0] ,
'stuff': ['foo', 'bar', 'spam']})
df2 = pd.DataFrame({'id': ['2','3'], 'val': [10, 20] })
df1.set_index('id', inplace=True)
df1.update(df2.set_index('id'))
df1.reset_index(inplace=True)
# df1.val = df1.val.astype(int) # convert to int
print(df1)
id val stuff
0 1 0 foo
1 2 10 bar
2 3 20 spam
Upvotes: 0
Reputation: 2053
I found a solution using a merge
:
df1 = df1.merge(df2, how='left', on='id')
df1['val'] = np.where(df1['val_y'].isnull(), df1['val_x'], df1['val_y'])
# recast to int
df1['val'] = df1['val'].astype(int)
# remove extra columns
df1.drop(['val_x', 'val_y'], axis=1, inplace=True)
print(df1)
id stuff val
0 1 foo 0
1 2 bar 10
2 3 spam 20
Upvotes: 1
Reputation: 76917
You could also do map
In [88]: df1['id'].map(df2.set_index('id')['val']).fillna(df1['val'])
Out[88]:
0 0.0
1 10.0
2 20.0
Name: id, dtype: float64
In [89]: df1['val'] = df1['id'].map(df2.set_index('id')['val']).fillna(df1['val'])
In [90]: df1
Out[90]:
id val
0 1 0.0
1 2 10.0
2 3 20.0
Upvotes: 1