Reputation: 25
I'm trying to tag objects that are duplicates in a JSON using Python, based only on the key/values for "price" and "full address" and ignoring "url". A new "duplicate" key is then created, with a 1 or a 2 value for each duplicate. How is can this be best done? Current:
A=[ {
"url": "google.com",
"price": 550,
"full address": "123 sesame st",
},
{
"url": "yahoo.com",
"price": 550,
"full address": "123 sesame st",
},
{
"url": "bing.com",
"price": 250,
"full address": "123 50th st",
}]
Intended result:
A=[ {
"url": "google.com",
"price": 550,
"full address": "123 sesame st",
"duplicate": 1
},
{
"url": "yahoo.com",
"price": 550,
"full address": "123 sesame st",
"duplicate": 2
},
{
"url": "bing.com",
"price": 250,
"full address": "123 50th st",
}]
Upvotes: 0
Views: 138
Reputation: 809
Optimized @iz_'s Answer:
Instead of doing second pass to delete the key for any non-duplicate, adding the duplicate
key only if there are any multiple occurrences. In this way, we can iterate the whole dictionary only once.
from collections import defaultdict
A=[ {
"url": "google.com",
"price": 550,
"full address": "123 sesame st",
},
{
"url": "yahoo.com",
"price": 550,
"full address": "123 sesame st",
},
{
"url": "bing.com",
"price": 250,
"full address": "123 50th st",
}
]
counts = defaultdict(dict)
for index in range(len(A)):
d = A[index]
k = (d["price"], d["full address"])
counts[k]["count"] = counts[k]["count"] + 1 if counts[k].get("count") else 1
if counts[k]["count"] == 1:
counts[k]["first_occurence"] = index
else:
A[counts[k]["first_occurence"]]["duplicate"] = 1
d["duplicate"] = counts[k]["count"]
print(A)
Output:
[{'full address': '123 sesame st', 'duplicate': 1, 'price': 550, 'url': 'google.com'}, {'full address': '123 sesame st', 'duplicate': 2, 'price': 550, 'url': 'yahoo.com'}, {'full address': '123 50th st', 'price': 250, 'url': 'bing.com'}]
Upvotes: 1
Reputation: 16613
Keep a running tally of duplicates and do a second pass to delete the key for any non-duplicate:
from collections import defaultdict
A = [
{
"url": "google.com",
"price": 550,
"full address": "123 sesame st",
},
{
"url": "yahoo.com",
"price": 550,
"full address": "123 sesame st",
},
{
"url": "bing.com",
"price": 250,
"full address": "123 50th st",
},
]
counts = defaultdict(int)
for d in A:
k = (d["price"], d["full address"])
counts[k] += 1
d["duplicate"] = counts[k]
for d in A:
if counts[(d["price"], d["full address"])] == 1:
del d["duplicate"]
print(A)
Upvotes: 1