Reputation: 245
Trying to update a column in df1 to df2 based on another value in df1.
df1:
# Create DataFrame
df1 = pd.DataFrame({'unique_value': ['xyz123', 'eff987', 'efg125', 'xyz123', 'eff987']})
# Assign ID column (no duplicates)
df1['ID'] = pd.factorize(df1['unique_value'])[0] + 1000
# Call df1
df1
unique_value ID
0 xyz123 1000
1 eff987 1001
2 efg125 1002
3 xyz123 1000
4 eff987 1001
df2:
# Create DataFrame
df2 = pd.DataFrame({'unique_value': ['xyz123', 'eff987', 'efg125', 'xyz123']})
# Call df2
df2
unique_value
0 xyz123
1 eff987
2 efg125
3 xyz123
Desired output for df2:
unique_value ID
0 xyz123 1000
1 eff987 1001
2 efg125 1002
3 xyz123 1000
Can we use a type of table join?
Upvotes: 1
Views: 139
Reputation: 62393
.merge
with drop_duplicates
'unique_value'
.import pandas as pd
df1 = pd.DataFrame({'unique_value': ['xyz123', 'eff987', 'efg125', 'xyz123', 'eff987'], 'ID': [1000, 1001, 1002, 1000, 1001]})
df2 = pd.DataFrame({'unique_value': ['xyz123', 'eff987', 'efg125', 'xyz123']})
# merge df2 and df1, while dropping duplicates from df1
df2 = df2.merge(df1.drop_duplicates(), on='unique_value', how='left')
# df2
unique_value ID
0 xyz123 1000
1 eff987 1001
2 efg125 1002
3 xyz123 1000
df2 = df2.merge(df1, on='unique_value', how='left')
# df2
unique_value ID
0 xyz123 1000
1 xyz123 1000
2 eff987 1001
3 eff987 1001
4 efg125 1002
5 xyz123 1000
6 xyz123 1000
Upvotes: 2