user2489252
user2489252

Reputation:

A clean and efficient way to update cells in pandas DataFrames

I am looking for a cleaner way to achieve the following:

I have a DataFrame with certain columns that I want to update if new information arrives. This "new information" in for of a pandas DataFrame (from a CSV file) can have more or less rows, however, I am only interested in adding

Original DataFrame

enter image description here

DataFrame with new information

enter image description here

(Note the missing name "c" here and the change in "status" for name "a")

Now, I wrote the following "inconvenient" code to update the original DataFrame with the new information

Updating the "status" column based on the "name" column

for idx,row in df_base.iterrows():
    if not df_upd[df_upd['name'] == row['name']].empty:
        df_base.loc[idx, 'status'] = df_upd.loc[df_upd['name'] == row['name'], 'status'].values

enter image description here

It achieves exactly what I want, but it just does neither look nice nor efficient, and I hope that there might be a cleaner way. I tried the pd.merge method, however, the problem is that it would be adding new columns instead of "updating" the cells in that column.

pd.merge(left=df_base, right=df_upd, on=['name'], how='left')

enter image description here

I am looking forward to your tips and ideas.

Upvotes: 1

Views: 810

Answers (1)

DSM
DSM

Reputation: 353199

You could set_index("name") and then call .update:

>>> df_base = df_base.set_index("name")
>>> df_upd = df_upd.set_index("name")
>>> df_base.update(df_upd)
>>> df_base
      status
name        
a          0
b          1
c          0
d          1

More generally, you can set the index to whatever seems appropriate, update, and then reset as needed.

Upvotes: 2

Related Questions