SY9
SY9

Reputation: 165

python: how to merge dict in list of dicts based on value

I have a list of dicts, where each dict consists of 3 keys: name, url, and location.
Only value of 'name' can be the same throughout the dicts, and both 'url' and 'location' are always different value throughout the list.

Example:

[
{"name":"A1", "url":"B1", "location":"C1"}, 
{"name":"A1", "url":"B2", "location":"C2"}, 
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}, ...
]  

Then I want to make them grouping based on the value in 'name' as follows.

Expected:

[
{"name":"A1", "url":"B1, B2", "location":"C1, C2"},
{"name":"A2", "url":"B3, B4", "location":"C3, C4"},
]

(actual list consists of >2,000 dicts)

I'd be very glad to get solved this situation.
Any advice / answers will be greatly appreciated.

Thanks in advance.

Upvotes: 2

Views: 1465

Answers (6)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

With auxiliary grouping dict (for Python > 3.5):

data = [
    {"name":"A1", "url":"B1", "location":"C1"}, 
    {"name":"A1", "url":"B2", "location":"C2"}, 
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"}
]

groups = {}
for d in data:
    if d['name'] not in groups:
        groups[d['name']] = {'url': d['url'], 'location': d['location']}
    else:
        groups[d['name']]['url'] += ', ' + d['url']
        groups[d['name']]['location'] += ', ' + d['location']
result = [{**{'name': k}, **v} for k, v in groups.items()]

print(result)

The output:

[{'name': 'A1', 'url': 'B1, B2', 'location': 'C1, C2'}, {'name': 'A2', 'url': 'B3, B4', 'location': 'C3, C4'}]

Upvotes: 4

CristiFati
CristiFati

Reputation: 41112

Here's a variant (it's hard to even read it, feels like scratching the right side of my head using my left hand, but at this point, I don't know how to make it shorter) that uses:

>>> pprint.pprint(initial_list)
[{'location': 'C1', 'name': 'A1', 'url': 'B1'},
 {'location': 'C2', 'name': 'A1', 'url': 'B2'},
 {'location': 'C3', 'name': 'A2', 'url': 'B3'},
 {'location': 'C4', 'name': 'A2', 'url': 'B4'}]
>>>
>>> NAME_KEY = "name"
>>>
>>> final_list = [list(itertools.accumulate(group_list, func=lambda x, y: {key: x[key] if key == NAME_KEY else " ".join([x[key], y[key]]) for key in x}))[-1] \
...     for group_list in [list(group[1]) for group in itertools.groupby(sorted(initial_list, key=lambda x: x[NAME_KEY]), key=lambda x: x[NAME_KEY])]]
>>>
>>> pprint.pprint(final_list)
[{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'},
 {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]

Rationale (from outer to inner):

  • Group the dictionaries in the initial list based on their value corresponding to the name key (itertools.groupby)
    • An auxiliary operation for this to work properly is to sort the list on the same value prior to grouping (sorted)
  • For each such group of dictionaries, perform their "sum" (itertools.accumulate)
    • func argument "sums" 2 dictionaries, based on the keys:
      • If the key is name, just take the value from the 1st dictionary (it's the same for both dictionaries, anyway)
      • Otherwise just add the 2 values (strings) with a space in between

Considerations:

  • The dictionaries have to stay homogeneous (all must have the same structure (keys))
  • Only the name key is hardcoded (but, if you decide to add other keys which are not strings, you'll have to adjust func too)
  • It could be split for readability
  • Not sure about the lambdas (performance wise)

Upvotes: 0

salparadise
salparadise

Reputation: 5805

where res is:

[{'location': 'C1', 'name': 'A1', 'url': 'B1'},
 {'location': 'C2', 'name': 'A1', 'url': 'B2'},
 {'location': 'C3', 'name': 'A2', 'url': 'B3'},
 {'location': 'C4', 'name': 'A2', 'url': 'B4'}]

You can work with the data using a defaultdict and unpacking the result into a list comprehension:

from collections import defaultdict

result = defaultdict(lambda: defaultdict(list))

for items in res:
     result[items['name']]['location'].append(items['location'])
     result[items['name']]['url'].append(items['url'])

final = [
    {'name': name, **{inner_names: ' '.join(inner_values) for inner_names, inner_values in values.items()}}
    for name, values in result.items()
]

And final is:

In [57]: final
Out[57]:
[{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'},
 {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]

Upvotes: 2

Shubho Shaha
Shubho Shaha

Reputation: 2139

Since your dataset is relatively small then I guess Time complexity is not a big deal here so you could consider following code.

from collections import defaultdict
given_data = [
    {"name":"A1", "url":"B1", "location":"C1"}, 
    {"name":"A1", "url":"B2", "location":"C2"}, 
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"},
] 
D = defaultdict(list)
for item in given_data:
    D[item['name']].append(item)
result = []
for x in D:
    urls = ""
    locations = ""
    for pp in D[x]:
        urls += pp['url']+" "
        locations += pp['location']+" "
    result.append({'name': x, 'url': urls.strip(), 'location': locations.strip()})

Upvotes: 4

Mika72
Mika72

Reputation: 411

Something like this? Small deviation: I preferred to store urls and locations in a list inside resDict, not in appended str.

myDict = [
{"name":"A1", "url":"B1", "location":"C1"}, 
{"name":"A1", "url":"B2", "location":"C2"}, 
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}
]

resDict = []

def getKeys(d):
    arr = []
    for row in d:
        arr.append(row["name"])
    ret = list(set(arr))
    return ret

def filteredDict(d, k):
    arr = []
    for row in d:
        if row["name"] == k:
            arr.append(row)
    return arr

def compressedDictRow(rowArr):
    urls = []
    locations = []
    name = rowArr[0]['name']

    for row in rowArr:
       urls.append(row['url'])
       locations.append(row['location'])
    return {"name":name,"urls":urls, "locations":locations}

keys = getKeys(myDict)

for key in keys:
    rowArr = filteredDict(myDict,key)
    row = compressedDictRow(rowArr)
    resDict.append(row)
print(resDict)

Outputs (in one line):

[
    {'name': 'A2', 'urls': ['B3', 'B4'], 'locations': ['C3', 'C4']}, 
    {'name': 'A1', 'urls': ['B1', 'B2'], 'locations': ['C1', 'C2']}
]

Upvotes: 0

vishal
vishal

Reputation: 1205

Using @Yaroslav Surzhikov comment, here is a solution using itertools.groupby

from itertools import groupby

dicts = [
    {"name":"A1", "url":"B1", "location":"C1"},
    {"name":"A1", "url":"B2", "location":"C2"},
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"},
]

def merge(dicts):
    new_list = []
    for key, group in groupby(dicts, lambda x: x['name']):
        new_item = {}
        new_item['name'] = key
        new_item['url'] = []
        new_item['location'] = []
        for item in group:
            new_item['url'].extend([item.get('url', '')])
            new_item['location'].extend([item.get('location', '')])
        new_item['url'] = ', '.join(new_item.get('url', ''))
        new_item['location'] = ', '.join(new_item.get('location', ''))
        new_list.append(new_item)
    return new_list

print(merge(dicts))

Upvotes: 0

Related Questions