Erwin
Erwin

Reputation: 381

combine dictionary in list based on matching multiple key:value pairs

I am new to python and I tried to google and find this one really similar to my case: combine dictionaries in list of dictionaries based on matching key:value pair but in my case suppose I have a list as below:

[{'entity': 'Mechanical properties',
   'offsetstart': 0,
   'offsetend': 21,
   'id': 'c_4683'},
  {'entity': 'properties',
   'offsetstart': 11,
   'offsetend': 21,
   'id': 'c_49874'},
  {'entity': 'properties',
   'offsetstart': 11,
   'offsetend': 21,
   'id': 'c_13609'},
  {'entity': 'wood',
   'offsetstart': 33,
   'offsetend': 37,
   'id': 'c_8421'}]

How can I combine the value of the key "id" when matching multiple key:value pairs (i.e. the value of key "entity", "offsetstart", and "offsetend")

So I can get the desired result as below:

[{'entity': 'Mechanical properties',
   'offsetstart': 0,
   'offsetend': 21,
   'id': 'c_4683'},
  {'entity': 'properties',
   'offsetstart': 11,
   'offsetend': 21,
   'id': ['c_49874', 'c_13609']},
  {'entity': 'wood',
   'offsetstart': 33,
   'offsetend': 37,
   'id': 'c_8421'}]

Thank you so much for any help!

Upvotes: 2

Views: 427

Answers (3)

Fran Arenas
Fran Arenas

Reputation: 648

output_list = []

for entity_dict in initial_list:
    current_entity = entity_dict['entity']
    entity_dict['id'] = [entity_dict['id']] #Change string to list type

    for output_dict in output_list: # Check if same entity was catched
        if output_dict['entity'] == current_entity:
            output_dict['id'] += entity_dict['id']
            break
    else:  # Executed if break not found. In this case is the first entity of his type
        output_list.append(entity_dict)

print(output_list)

[{'entity': 'Mechanical properties', 'offsetstart': 0, 'offsetend': 21, 'id': ['c_4683']}, {'entity': 'properties', 'offsetstart': 11, 'offsetend': 21, 'id': ['c_49874', 'c_13609']}, {'entity': 'wood', 'offsetstart': 33, 'offsetend': 37, 'id': ['c_8421']}]

I would recommend you to change your input list to a dict using the entity as key. In general, a list of dicts is not a good idea.

Upvotes: 2

timgeb
timgeb

Reputation: 78650

I would advise you to use either lists or strings for the values of the 'id' fields. Even if there is only one id, use a list with one element. Mixing both datastructures is inconsistent and may lead to bugs (both strings and lists are iterable).

Solution (data is your list of dicts):

ID_KEY = 'id'
tmp = {}

for d in data:
    fields = d['entity'], d['offsetstart'], d['offsetend']
    id_ = d[ID_KEY]
    if fields in tmp:
        tmp[fields][ID_KEY].append(id_)
    else:        
        d_copy = d.copy()
        d_copy[ID_KEY] = [id_]
        tmp[fields] = d_copy        
        
result = list(tmp.values())

Output:

>>> result
[{'entity': 'Mechanical properties',
  'offsetstart': 0,
  'offsetend': 21,
  'id': ['c_4683']},
 {'entity': 'properties',
  'offsetstart': 11,
  'offsetend': 21,
  'id': ['c_49874', 'c_13609']},
 {'entity': 'wood', 'offsetstart': 33, 'offsetend': 37, 'id': ['c_8421']}]

Upvotes: 1

Niel Godfrey P. Ponciano
Niel Godfrey P. Ponciano

Reputation: 10699

Here is a solution:

  • Iterate each item in the data.
  • Get the values of entity, offsetstart, and offsetend and make it the key of a dictionary. Its value will be a list of the id.
  • For every item that matches those values, append the id. Note that since we are using dictionary (which is a hash table), then this would just be a constant O(1) time complexity.
  • Reconstruct the dictionary.
from collections import defaultdict

data = [
  {'entity': 'Mechanical properties',
   'offsetstart': 0,
   'offsetend': 21,
   'id': 'c_4683'},
  {'entity': 'properties',
   'offsetstart': 11,
   'offsetend': 21,
   'id': 'c_49874'},
  {'entity': 'properties',
   'offsetstart': 11,
   'offsetend': 21,
   'id': 'c_13609'},
  {'entity': 'wood',
   'offsetstart': 33,
   'offsetend': 37,
   'id': 'c_8421'}
]

data_groups = defaultdict(list)
for record in data:
    record_id = record.pop('id')
    record_attrs = tuple(sorted(record.items()))
    data_groups[record_attrs].append(record_id)

data_result = []
for a, b in data_groups.items():
    data_result.append(dict([*a, ("id", b)]))

print(data_result)

Output (pretty printed)

[
    {'entity': 'Mechanical properties', 'offsetend': 21, 'offsetstart': 0, 'id': ['c_4683']},
    {'entity': 'properties', 'offsetend': 21, 'offsetstart': 11, 'id': ['c_49874', 'c_13609']},
    {'entity': 'wood', 'offsetend': 37, 'offsetstart': 33, 'id': ['c_8421']}
]

Upvotes: 1

Related Questions