Reputation: 165

python: how to merge dict in list of dicts based on value

I have a list of dicts, where each dict consists of 3 keys: name, url, and location.
Only value of 'name' can be the same throughout the dicts, and both 'url' and 'location' are always different value throughout the list.

Example:

[
{"name":"A1", "url":"B1", "location":"C1"}, 
{"name":"A1", "url":"B2", "location":"C2"}, 
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}, ...
]

Then I want to make them grouping based on the value in 'name' as follows.

Expected:

[
{"name":"A1", "url":"B1, B2", "location":"C1, C2"},
{"name":"A2", "url":"B3, B4", "location":"C3, C4"},
]

(actual list consists of >2,000 dicts)

I'd be very glad to get solved this situation.
Any advice / answers will be greatly appreciated.

Thanks in advance.

Upvotes: 2

Answers (6)

RomanPerekhrest

Reputation: 92854

With auxiliary grouping dict (for Python > 3.5):

data = [
    {"name":"A1", "url":"B1", "location":"C1"}, 
    {"name":"A1", "url":"B2", "location":"C2"}, 
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"}
]

groups = {}
for d in data:
    if d['name'] not in groups:
        groups[d['name']] = {'url': d['url'], 'location': d['location']}
    else:
        groups[d['name']]['url'] += ', ' + d['url']
        groups[d['name']]['location'] += ', ' + d['location']
result = [{**{'name': k}, **v} for k, v in groups.items()]

print(result)

The output:

[{'name': 'A1', 'url': 'B1, B2', 'location': 'C1, C2'}, {'name': 'A2', 'url': 'B3, B4', 'location': 'C3, C4'}]

Upvotes: 4

CristiFati

Reputation: 41112

Here's a variant (it's hard to even read it, feels like scratching the right side of my head using my left hand, but at this point, I don't know how to make it shorter) that uses:

[Python]: itertools - Functions creating iterators for efficient looping
- groupby
- accumulate
Comprehensions (list and dict)

>>> pprint.pprint(initial_list)
[{'location': 'C1', 'name': 'A1', 'url': 'B1'},
 {'location': 'C2', 'name': 'A1', 'url': 'B2'},
 {'location': 'C3', 'name': 'A2', 'url': 'B3'},
 {'location': 'C4', 'name': 'A2', 'url': 'B4'}]
>>>
>>> NAME_KEY = "name"
>>>
>>> final_list = [list(itertools.accumulate(group_list, func=lambda x, y: {key: x[key] if key == NAME_KEY else " ".join([x[key], y[key]]) for key in x}))[-1] \
...     for group_list in [list(group[1]) for group in itertools.groupby(sorted(initial_list, key=lambda x: x[NAME_KEY]), key=lambda x: x[NAME_KEY])]]
>>>
>>> pprint.pprint(final_list)
[{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'},
 {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]

Rationale (from outer to inner):

Group the dictionaries in the initial list based on their value corresponding to the name key (itertools.groupby)
- An auxiliary operation for this to work properly is to sort the list on the same value prior to grouping (sorted)
For each such group of dictionaries, perform their "sum" (itertools.accumulate)
- func argument "sums" 2 dictionaries, based on the keys:
  - If the key is name, just take the value from the 1^st dictionary (it's the same for both dictionaries, anyway)
  - Otherwise just add the 2 values (strings) with a space in between

Considerations:

The dictionaries have to stay homogeneous (all must have the same structure (keys))
Only the name key is hardcoded (but, if you decide to add other keys which are not strings, you'll have to adjust func too)
It could be split for readability
Not sure about the lambdas (performance wise)

Upvotes: 0

salparadise

Reputation: 5805

where res is:

[{'location': 'C1', 'name': 'A1', 'url': 'B1'},
 {'location': 'C2', 'name': 'A1', 'url': 'B2'},
 {'location': 'C3', 'name': 'A2', 'url': 'B3'},
 {'location': 'C4', 'name': 'A2', 'url': 'B4'}]

You can work with the data using a defaultdict and unpacking the result into a list comprehension:

from collections import defaultdict

result = defaultdict(lambda: defaultdict(list))

for items in res:
     result[items['name']]['location'].append(items['location'])
     result[items['name']]['url'].append(items['url'])

final = [
    {'name': name, **{inner_names: ' '.join(inner_values) for inner_names, inner_values in values.items()}}
    for name, values in result.items()
]

And final is:

In [57]: final
Out[57]:
[{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'},
 {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]

Upvotes: 2

Shubho Shaha

Reputation: 2139

Since your dataset is relatively small then I guess Time complexity is not a big deal here so you could consider following code.

from collections import defaultdict
given_data = [
    {"name":"A1", "url":"B1", "location":"C1"}, 
    {"name":"A1", "url":"B2", "location":"C2"}, 
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"},
] 
D = defaultdict(list)
for item in given_data:
    D[item['name']].append(item)
result = []
for x in D:
    urls = ""
    locations = ""
    for pp in D[x]:
        urls += pp['url']+" "
        locations += pp['location']+" "
    result.append({'name': x, 'url': urls.strip(), 'location': locations.strip()})

Upvotes: 4

Mika72

Reputation: 411

Something like this? Small deviation: I preferred to store urls and locations in a list inside resDict, not in appended str.

myDict = [
{"name":"A1", "url":"B1", "location":"C1"}, 
{"name":"A1", "url":"B2", "location":"C2"}, 
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}
]

resDict = []

def getKeys(d):
    arr = []
    for row in d:
        arr.append(row["name"])
    ret = list(set(arr))
    return ret

def filteredDict(d, k):
    arr = []
    for row in d:
        if row["name"] == k:
            arr.append(row)
    return arr

def compressedDictRow(rowArr):
    urls = []
    locations = []
    name = rowArr[0]['name']

    for row in rowArr:
       urls.append(row['url'])
       locations.append(row['location'])
    return {"name":name,"urls":urls, "locations":locations}

keys = getKeys(myDict)

for key in keys:
    rowArr = filteredDict(myDict,key)
    row = compressedDictRow(rowArr)
    resDict.append(row)
print(resDict)

Outputs (in one line):

[
    {'name': 'A2', 'urls': ['B3', 'B4'], 'locations': ['C3', 'C4']}, 
    {'name': 'A1', 'urls': ['B1', 'B2'], 'locations': ['C1', 'C2']}
]

Upvotes: 0

vishal

Reputation: 1205

Using @Yaroslav Surzhikov comment, here is a solution using itertools.groupby

from itertools import groupby

dicts = [
    {"name":"A1", "url":"B1", "location":"C1"},
    {"name":"A1", "url":"B2", "location":"C2"},
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"},
]

def merge(dicts):
    new_list = []
    for key, group in groupby(dicts, lambda x: x['name']):
        new_item = {}
        new_item['name'] = key
        new_item['url'] = []
        new_item['location'] = []
        for item in group:
            new_item['url'].extend([item.get('url', '')])
            new_item['location'].extend([item.get('location', '')])
        new_item['url'] = ', '.join(new_item.get('url', ''))
        new_item['location'] = ', '.join(new_item.get('location', ''))
        new_list.append(new_item)
    return new_list

print(merge(dicts))

Upvotes: 0

python: how to merge dict in list of dicts based on value

Answers (6)

Related Questions