user11357465
user11357465

Reputation:

How to apply if statements across multiple DataFrames?

I have two dataframes with different shapes, I want to apply an conditional If statement to df1 and input values from df2. Df1 will have duplicate rows but I need them filled, which means replacing -9 values with the values from df2 in a particular column

df1:

Code 1    Name
2         Sam
5         James
7         Mark
6         Steven
-9        Michael
-9        Sarah
-9        Sam
5         James
-9        Mark
6         Steven
7         Michael
-9        Sarah
-9        Chris

df2:

Code 1    Name
20        Sam
30        James
40        Mark
50        Steven
70        Michael
45        Sarah

df1 Intended output:

Code 1    Name
2         Sam
5         James
7         Mark
6         Steven
70        Michael
45        Sarah
20        Sam
5         James
40        Mark
6         Steven
7        Michael
45        Sarah
-9        Chris

Upvotes: 1

Views: 298

Answers (2)

Parfait
Parfait

Reputation: 107632

With an SQL mentality, consider a set-based approach with merge, assign, and np.where conditional logic (counterparts to SQL's JOIN, SELECT, and CASE):

df1 = (df1.merge(df2, on="Name", how='left', suffixes=['','_'])
          .assign(Code1 = lambda x: (np.where((x['Code1'] == -9) & (pd.notna(x['Code1_'])), 
                                               x['Code1_'], 
                                               x['Code1'])).astype(int))
          .drop(columns = ['Code1_'])
      )

#     Code1     Name
# 0       2      Sam
# 1       5    James
# 2       7     Mark
# 3       6   Steven
# 4      70  Michael
# 5      45    Sarah
# 6      20      Sam
# 7       5    James
# 8      40     Mark
# 9       6   Steven
# 10      7  Michael
# 11     45    Sarah
# 12     -9    Chris

Online Demo (click Run at top)

Upvotes: 0

jezrael
jezrael

Reputation: 862661

Use Series.map for new Series and replace only matched rows by condition, last for non matched rows replace missing values to -9:

m = df1['Code 1'] == -9
df1.loc[m, 'Code 1'] = df1.loc[m, 'Name'].map(df2.set_index('Name')['Code 1'])
df1['Code 1'] = df1['Code 1'].fillna(-9).astype(int)
print (df1)
    Code 1     Name
0        2      Sam
1        5    James
2        7     Mark
3        6   Steven
4       70  Michael
5       45    Sarah
6       20      Sam
7        5    James
8       40     Mark
9        6   Steven
10       7  Michael
11      45    Sarah
12      -9    Chris

Upvotes: 1

Related Questions