Find duplicates based on specific key/value only

Question

I'm trying to tag objects that are duplicates in a JSON using Python, based only on the key/values for "price" and "full address" and ignoring "url". A new "duplicate" key is then created, with a 1 or a 2 value for each duplicate. How is can this be best done? Current:

 A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}]

Intended result:

 A=[           {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 1
    },
        {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 2
    },
        {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    }]

iz_ · Accepted Answer

Keep a running tally of duplicates and do a second pass to delete the key for any non-duplicate:

from collections import defaultdict

A = [
    {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
    },
    {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
    },
    {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    },
]

counts = defaultdict(int)

for d in A:
    k = (d["price"], d["full address"])
    counts[k] += 1
    d["duplicate"] = counts[k]

for d in A:
    if counts[(d["price"], d["full address"])] == 1:
        del d["duplicate"]

print(A)

Find duplicates based on specific key/value only

Answers (2)

Related Questions