Reputation: 53
I have json file which has duplicate keys.
Example
{
"data":"abc",
"data":"xyz"
}
I want to make this as { "data1":"abc", "data2":"xyz" }
I tried using object_pairs_hook with json_loads, but it is not working. Could anyone one help me with Python solution for above problem
Upvotes: 2
Views: 1017
Reputation: 1653
A quick and dirty solution using re
.
import re
s = '{ "data":"abc", "data":"xyz", "test":"one", "test":"two", "no":"numbering" }'
def find_dupes(s):
keys = re.findall(r'"(\w+)":', s)
return list(set(filter(lambda w: keys.count(w) > 1, keys)))
for key in find_dupes(s):
for i in range(1, len(re.findall(r'"{}":'.format(key), s)) + 1):
s = re.sub(r'"{}":'.format(key), r'"{}{}":'.format(key, i), s, count=1)
print(s)
Prints this string:
{
"data1":"abc",
"data2":"xyz",
"test1":"one",
"test2":"two",
"no":"numbering"
}
Upvotes: 0
Reputation: 13810
You can pass the load
method a keyword parameter to handle pairing, there you can check for duplicates like this:
raw_text_data = """{
"data":"abc",
"data":"xyz",
"data":"xyz22"
}"""
def manage_duplicates(pairs):
d = {}
k_counter = Counter(defaultdict(int))
for k, v in pairs:
d[k+str(k_counter[k])] = v
k_counter[k] += 1
return d
print(json.loads(raw_text_data, object_pairs_hook=manage_duplicates))
I used Counter
to count each key, if it already exists, I'm saving the key as k+str(k_counter[k)
- so it will be added with a trailing number.
P.S
If you have control on the input, I would highly recommend to change your json structure to:
{"data": ["abc", "xyz"]}
The rfc 4627 for application/json
media type recommends unique keys but it doesn't forbid them explicitly:
The names within an object SHOULD be unique.
Upvotes: 2