How to change value of cell in one column, when you have duplicate cells across two columns

Question

I've got a pandas dataframe consisting of address fields for columns. My problem is that across two columns, I have duplicate cell values across the row. Does anybody know how I can conditionally change the value for one column when a duplicate is found across the two columns? Ideally I would like to keep one value, and set the other to np.nan.

Here is a test case:

import pandas as pd

test = pd.read_json('{"housename":{"16":null,"17":null,"18":null},"name":{"16":"Shoecare","17":"33","18":"33A"},"house_number":{"16":"32","17":"33","18":"33A"},"street":{"16":"Carfax","17":"Carfax","18":"Carfax"},"city":{"16":"Horsham","17":"Horsham","18":"Horsham"},"postcode":{"16":"RH12 1EE","17":"RH12 1EE","18":"RH12 1EE"}}')

    city        house_number    housename   name        postcode    street
16  Horsham     32              NaN         Shoecare    RH12 1EE    Carfax
17  Horsham     33              NaN         33          RH12 1EE    Carfax
18  Horsham     33A             NaN         33A         RH12 1EE    Carfax

On the test case, I've played with test.duplicated(subset=['house_number', 'name']), but it won't identify duplicate values in the house_number and name columns.

Does anybody have any suggestions of how to firstly identify duplicated cells across two columns, and then set one value to np.nan?

Desired output:

    housename   name      house_number  street  city     postcode
16  NaN         Shoecare  32            Carfax  Horsham  RH12 1EE
17  NaN         NaN       33            Carfax  Horsham  RH12 1EE
18  NaN         NaN       33A           Carfax  Horsham  RH12 1EE

Joe · Accepted Answer

If the 2 columns are house_number and name, you can do in this way:

test['name'] = np.where((test['house_number'] == test['name']), np.nan, test['name'])

Output:

       city house_number  housename      name  postcode  street
16  Horsham           32        NaN  Shoecare  RH12 1EE  Carfax
17  Horsham           33        NaN       NaN  RH12 1EE  Carfax
18  Horsham          33A        NaN       NaN  RH12 1EE  Carfax

How to change value of cell in one column, when you have duplicate cells across two columns

Answers (1)

Related Questions