Reputation:
I am looking for a cleaner way to achieve the following:
I have a DataFrame with certain columns that I want to update if new information arrives. This "new information" in for of a pandas DataFrame
(from a CSV file) can have more or less rows, however, I am only interested in adding
(Note the missing name "c
" here and the change in "status" for name "a
")
Now, I wrote the following "inconvenient" code to update the original DataFrame with the new information
for idx,row in df_base.iterrows():
if not df_upd[df_upd['name'] == row['name']].empty:
df_base.loc[idx, 'status'] = df_upd.loc[df_upd['name'] == row['name'], 'status'].values
It achieves exactly what I want, but it just does neither look nice nor efficient, and I hope that there might be a cleaner way. I tried the pd.merge
method, however, the problem is that it would be adding new columns instead of "updating" the cells in that column.
pd.merge(left=df_base, right=df_upd, on=['name'], how='left')
I am looking forward to your tips and ideas.
Upvotes: 1
Views: 810
Reputation: 353199
You could set_index("name")
and then call .update
:
>>> df_base = df_base.set_index("name")
>>> df_upd = df_upd.set_index("name")
>>> df_base.update(df_upd)
>>> df_base
status
name
a 0
b 1
c 0
d 1
More generally, you can set the index to whatever seems appropriate, update, and then reset as needed.
Upvotes: 2