barciewicz
barciewicz

Reputation: 3773

Replace values in a dataframe with values from another dataframe when common value is found in specific column

I am trying to replace hours in df with hours from replacements for project IDs that exist in both dataframes:

import pandas as pd

df = pd.DataFrame({
    'project_ids': [1, 2, 3, 4, 5],
    'hours': [111, 222, 333, 444, 555],
    'else' :['a', 'b', 'c', 'd', 'e']
})

replacements = pd.DataFrame({
    'project_ids': [2, 5, 3],
    'hours': [666, 999, 1000],
})

for project in replacements['project_ids']:
    df.loc[df['project_ids'] == project, 'hours'] = replacements.loc[replacements['project_ids'] == project, 'hours']

print(df)

However, only the project ID 3 gets correct assignment (1000), but both 2 and 5 get NaN:

 projects   hours else
0         1   111.0    a
1         2     NaN    b
2         3  1000.0    c
3         4   444.0    d
4         5     NaN    e
  1. How can I fix it?
  2. Is there a better way to do this?

Upvotes: 0

Views: 132

Answers (3)

Erfan
Erfan

Reputation: 42886

Another solution would be to use pandas.merge and then use fillna:

df_new = pd.merge(df, replacements, on='project_ids', how='left', suffixes=['_1', ''])
df_new['hours'].fillna(df_new['hours_1'], inplace=True)
df_new.drop('hours_1', axis=1, inplace=True)

print(df_new)
   project_ids else   hours
0            1    a   111.0
1            2    b   666.0
2            3    c  1000.0
3            4    d   444.0
4            5    e   999.0

Upvotes: 0

anky
anky

Reputation: 75080

Another way using df.update():

m=df.set_index('project_ids')
m.update(replacements.set_index('project_ids')['hours'])
print(m.reset_index())

   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e

Upvotes: 1

jezrael
jezrael

Reputation: 862481

Use Series.map with another Series created by replacements with DataFrame.set_index:

s = replacements.set_index('project_ids')['hours']
df['hours'] = df['project_ids'].map(s).fillna(df['hours'])
print(df)
   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e

Upvotes: 1

Related Questions