Merging two pandas data frames with duplicate ON entries

Question

I'm working with two pandas DataFrame (two separate years), that contains information about individuals. I'm using pd.merge with inner join. I can set the 'on' based on the individual names. However, I noticed that there are duplicate entries with the same name (essentially two people with the same name). To separate these records I use other fields, such as their locations. My code looks like:

df_combine = pd.merge(df_current, df_old, on=['Name', 'Department', 'College Name'])

While that works to fix this specific case, there are cases where someone with a unique name moves department. As such, this is not caught.

Is there way to first merge by name for unique cases and then for duplicate records by name to consider other fields?

For clarity, consider the records having two "Doe,John" in two different departments and one "Doe,Jane" in one department for one year and another for the second year. I would like the code to recognize the two "Doe,John" and one "Doe,Jane"

Edit, example dataframes:

Latest dataframe:

Name, Department, Field1
"Doe,John","ABC",555
"Doe,John","DEF",145
"Doe,Jane","FGH",205

Previous dataframe:

Name, Department, Field1
"Doe,John","ABC",345
"Doe,John","DEF",140
"Doe,Jane","ABC",200

Note the change in Jane Doe's department.

Expected result:

Name, Department, Field1_A, Field1_B
"Doe,John","ABC",555,345
"Doe,John","DEF",145,140
"Doe,Jane","FGH",205,200

Merging two pandas data frames with duplicate ON entries

Answers (1)

Related Questions