How to find records with same value in one column but different value in another column

Question

I have two pandas df with the exact same column names. One of these columns is named id_number which is unique to each table (What I mean is an id_number can only appear once in each df). I want to find all records that have the same id_number but have at least one different value in any column and store these records in a new pandas df.

I've tried merging (more specifically inner join), but it keeps only one record with that specific id_number so I can't look for any differences between the two dfs.

Let me provide some example to provide a clearer explanation:

Example dfs:

First DF:

id_number    name    type    city
    1        John    dev     Toronto
    2        Alex    dev     Toronto
    3        Tyler   dev     Toronto
    4        David   dev     Toronto
    5        Chloe   dev     Toronto

Second DF:

id_number    name    type    city
    1        John    boss    Vancouver
    2        Alex    dev     Vancouver
    4        David   boss    Toronto
    5        Chloe   dev     Toronto
    6        Kyle    dev     Vancouver

I want the resulting df to contain the following records:

id_number    name    type    city
    1        John    dev     Toronto
    1        John    boss    Vancouver
    2        Alex    dev     Toronto
    2        Alex    dev     Vancouver
    4        David   dev     Toronto
    4        David   Boss    Toronto

NOTE: I would not want records with id_number 5 to appear in the resulting df, that is because the records with id_number 5 are exactly the same in both dfs.

In reality, there are 80 columns for each record, but I think these tables make my point a little clearer. Again to summarize, I want the resulting df to contain records with same id_numbers, but a different value in any of the other columns. Thanks in advance for any help!

Scott Boston · Accepted Answer

Here is, a way using pd.concat, drop_duplicates and duplicated:

pd.concat([df1, df2]).drop_duplicates(keep=False).sort_values('id_number')\
  .loc[lambda x: x.id_number.duplicated(keep=False)]

Output:

   id_number   name  type       city
0          1   John   dev    Toronto
0          1   John  boss  Vancouver
1          2   Alex   dev    Toronto
1          2   Alex   dev  Vancouver
3          4  David   dev    Toronto
2          4  David  boss    Toronto

How to find records with same value in one column but different value in another column

Answers (2)

Related Questions