Reputation: 385
I need to add a column to a dataframe and fill it with values from another dataframe, but I do not have a unique ID or key or index that is shared between them. They do have two identifiers in common that make each row unique between them, and I want to try and match on both those columns.
Here is an example...
Student ID Test Day Student ID Test Day Grade
E0304003 Quiz 1 E0304003 Quiz 1 A
E3305002 Quiz 1 E0303003 Test 2 A
E0303002 Test 1 E0304005 Quiz 2 D
E0304004 Quiz 1 E3306001 Test 1 C
E0301001 Quiz 1 E0301001 Quiz 1 A
E0307002 Test 2 E0307002 Test 2 C
E0303003 Test 2 E0308002 Quiz 2 A
E0304005 Quiz 2 E0301002 Test 2 C
E3306001 Test 1 E3305002 Quiz 1 C
E0308002 Quiz 2 E0303002 Test 1 F
E0301002 Test 2 E0304004 Quiz 1 C
I want to create a third column in the first set that is also Grade, and contains the same values for rows that match the student ID and Test Day in the other dataframe.
You can see that the column order in each set doesn't match, and in the real data, there will be missing values, so I need this to be dynamic; it's not as simple as just sorting and then matching.
Can I do something like:
For row in df_one:
if df_one["Student ID"] == df_two["Student ID"] and df_one["Test Day"] == df_two["Day"]:
df_one["Grade"] = df_two["Grade"]
else:
df_one["Grade"] = "Unknown"
Upvotes: 2
Views: 2346
Reputation: 863301
Use merge
with left join and if some values not match get NaN
s, which are replaced by fillna
:
df = pd.merge(df_one, df_two, on=['Student ID','Test Day'], how='left')
df['Grade'] = df['Grade'].fillna('Unknown')
print (df)
Student ID Test Day Grade
0 E0304003 Quiz 1 A
1 E3305002 Quiz 1 C
2 E0303002 Test 1 F
3 E0304004 Quiz 1 C
4 E0301001 Quiz 1 A
5 E0307002 Test 2 C
6 E0303003 Test 2 A
7 E0304005 Quiz 2 D
8 E3306001 Test 1 C
9 E0308002 Quiz 2 A
10 E0301002 Test 2 C
Upvotes: 2