Razorfen
Razorfen

Reputation: 433

Combine dicts with "duplicate" keys from list

I receive data from an internal interface that comes as a list of dicts where multiple of those dicts represent a data-record if they where combined.

Data looks similar to this:

# received data contains 'duplicate' dict-keys
DATA = [
    {"ID": 1234},
    {"PRICE": 77.33},
    {"DATE": "20201222"},
    {"ID": 4567},
    {"PRICE": 100.99},
    {"DATE": "20201222"}
]

In the above example, a "complete" record would contain the dicts ID, PRICE and DATE.

Unfortunately the dict-keys exists multiple times so when I try something like this:

result = {}
for row in DATA:
    for idx, val in row.items():
        result[idx] = val

print(result)
# {
#     'ID': 4567,
#     'PRICE': 100.99,
#     'DATE': '20201222'
# }

The dict-keys (obviously) overwrite themselves.

I can't find a solution on how to combine the data into this desired structure:

DESIRED = [
    {
        "ID": 1234,
        "PRICE": 77.33,
        "DATE": "20201222"
    },
    {
        "ID": 4567,
        "PRICE": 100.99,
        "DATE": "20201222"
    }
]

Any hints for this? I'm even unsure on how to search for a solution to be honest.

Upvotes: 1

Views: 64

Answers (6)

Ajax1234
Ajax1234

Reputation: 71451

You can use a nested dictionary comprehension:

data = [{'ID': 1234}, {'PRICE': 77.33}, {'DATE': '20201222'}, {'ID': 4567}, {'PRICE': 100.99}, {'DATE': '20201222'}]
r = [{a:b for j in data[i:i+3] for a, b in j.items()} for i in range(0, len(data), 3)]

Output:

[{'ID': 1234, 'PRICE': 77.33, 'DATE': '20201222'}, {'ID': 4567, 'PRICE': 100.99, 'DATE': '20201222'}]

Upvotes: 0

Ved Rathi
Ved Rathi

Reputation: 325

if you are looking for an approach which is flexible and can handle any size of data with any names then here it is:

items = {}

DATA = [
    {"ID": 1234},
    {"PRICE": 77.33},
    {"DATE": "20201222"},
    {"ID": 4567},
    {"PRICE": 100.99},
    {"DATE": "20201222"}
]

for i in DATA:
    key = list(i.keys())[0]
    val = i[key]
    if key in items:
        items[key].append(val)
    else:
        items[key] = [val]

output = []
keys = list(items.keys())
values = list(items.values())

for i in range(len(values[0])):
    curData = {}
    for k in keys:
        curData[k] = items[k][i]
    output.append(curData)

for i in output:
    print(i)


Upvotes: 0

Snehal Nair
Snehal Nair

Reputation: 191

Dictionaries do not support duplicate keys. The alternative solution is to create keys with a list of values. This can be done in two different ways:

Using setdefault method: please refer to this link Make a dictionary with duplicate keys in Python!

results = {}   
for i, dict in enumerate(DATA):
    for k,v in DATA[i].items():
        results.setdefault(k, []).append(v)
print(results)

Using defaultdict method: please refer to this link Make a dictionary with duplicate keys in Python!

from collections import defaultdict
default_dict = defaultdict(list)

for i, dict in enumerate(DATA):
    for k,v in DATA[i].items():
        default_dict[k].append(v)
print(default_dict)

Upvotes: 0

Apo
Apo

Reputation: 338

They might be a better way to do it but a simple loop with a step of 3 is sufficient. As long as the input data is formatted as you showed it will work. For example:

DATA = [
    {"ID": 1234},
    {"PRICE": 77.33},
    {"DATE": "20201222"},
    {"ID": 4567},
    {"PRICE": 100.99},
    {"DATE": "20201222"}
]

DESIRED = []

for i in range(0,len(DATA),3):
    DESIRED.append(DATA[i]) #ID
    DESIRED[-1].update(DATA[i+1]) #PRICE
    DESIRED[-1].update(DATA[i+2]) #DATE

print(DESIRED)

Upvotes: 0

Dani Mesejo
Dani Mesejo

Reputation: 61910

If the values are always contiguous (and of size 3), you could use zip to iterate in triplets:

DATA = [
    {"ID": 1234},
    {"PRICE": 77.33},
    {"DATE": "20201222"},
    {"ID": 4567},
    {"PRICE": 100.99},
    {"DATE": "20201222"}
]


res = [{**i, **price, **date } for i, price, date in zip(DATA[::3], DATA[1::3], DATA[2::3])]
print(res)

Output

[{'DATE': '20201222', 'ID': 1234, 'PRICE': 77.33},
 {'DATE': '20201222', 'ID': 4567, 'PRICE': 100.99}]

An alternative solution is to use, the following for loop:

res = []
for i, price, date in zip(DATA[::3], DATA[1::3], DATA[2::3]):
    res.append({"ID": i["ID"], "PRICE": price["PRICE"], "DATE": date["DATE"]})

Upvotes: 2

Paul M.
Paul M.

Reputation: 10799

If your DATA dictionaries are guaranteed to appear in the order you've shown, and they always appear in groups of three, you can grab three dictionaries at a time and merge them:

from itertools import islice

data = iter([
    {"ID": 1234},
    {"PRICE": 77.33},
    {"DATE": "20201222"},
    {"ID": 4567},
    {"PRICE": 100.99},
    {"DATE": "20201222"}
])

while chunk := list(islice(data, 3)):
    id_dict, price_dict, date_dict = chunk
    merged = {**id_dict, **price_dict, **date_dict}
    print(merged)

Output:

{'ID': 1234, 'PRICE': 77.33, 'DATE': '20201222'}
{'ID': 4567, 'PRICE': 100.99, 'DATE': '20201222'}
>>> 

Upvotes: 0

Related Questions