Reputation: 381
I am new to python and I tried to google and find this one really similar to my case: combine dictionaries in list of dictionaries based on matching key:value pair but in my case suppose I have a list as below:
[{'entity': 'Mechanical properties',
'offsetstart': 0,
'offsetend': 21,
'id': 'c_4683'},
{'entity': 'properties',
'offsetstart': 11,
'offsetend': 21,
'id': 'c_49874'},
{'entity': 'properties',
'offsetstart': 11,
'offsetend': 21,
'id': 'c_13609'},
{'entity': 'wood',
'offsetstart': 33,
'offsetend': 37,
'id': 'c_8421'}]
How can I combine the value of the key "id" when matching multiple key:value pairs (i.e. the value of key "entity", "offsetstart", and "offsetend")
So I can get the desired result as below:
[{'entity': 'Mechanical properties',
'offsetstart': 0,
'offsetend': 21,
'id': 'c_4683'},
{'entity': 'properties',
'offsetstart': 11,
'offsetend': 21,
'id': ['c_49874', 'c_13609']},
{'entity': 'wood',
'offsetstart': 33,
'offsetend': 37,
'id': 'c_8421'}]
Thank you so much for any help!
Upvotes: 2
Views: 427
Reputation: 648
output_list = []
for entity_dict in initial_list:
current_entity = entity_dict['entity']
entity_dict['id'] = [entity_dict['id']] #Change string to list type
for output_dict in output_list: # Check if same entity was catched
if output_dict['entity'] == current_entity:
output_dict['id'] += entity_dict['id']
break
else: # Executed if break not found. In this case is the first entity of his type
output_list.append(entity_dict)
print(output_list)
[{'entity': 'Mechanical properties', 'offsetstart': 0, 'offsetend': 21, 'id': ['c_4683']}, {'entity': 'properties', 'offsetstart': 11, 'offsetend': 21, 'id': ['c_49874', 'c_13609']}, {'entity': 'wood', 'offsetstart': 33, 'offsetend': 37, 'id': ['c_8421']}]
I would recommend you to change your input list to a dict using the entity as key. In general, a list of dicts is not a good idea.
Upvotes: 2
Reputation: 78650
I would advise you to use either lists or strings for the values of the 'id'
fields. Even if there is only one id, use a list with one element. Mixing both datastructures is inconsistent and may lead to bugs (both strings and lists are iterable).
Solution (data
is your list of dicts):
ID_KEY = 'id'
tmp = {}
for d in data:
fields = d['entity'], d['offsetstart'], d['offsetend']
id_ = d[ID_KEY]
if fields in tmp:
tmp[fields][ID_KEY].append(id_)
else:
d_copy = d.copy()
d_copy[ID_KEY] = [id_]
tmp[fields] = d_copy
result = list(tmp.values())
Output:
>>> result
[{'entity': 'Mechanical properties',
'offsetstart': 0,
'offsetend': 21,
'id': ['c_4683']},
{'entity': 'properties',
'offsetstart': 11,
'offsetend': 21,
'id': ['c_49874', 'c_13609']},
{'entity': 'wood', 'offsetstart': 33, 'offsetend': 37, 'id': ['c_8421']}]
Upvotes: 1
Reputation: 10699
Here is a solution:
entity
, offsetstart
, and offsetend
and make it the key of a dictionary. Its value will be a list of the id
.id
. Note that since we are using dictionary (which is a hash table), then this would just be a constant O(1) time complexity.from collections import defaultdict
data = [
{'entity': 'Mechanical properties',
'offsetstart': 0,
'offsetend': 21,
'id': 'c_4683'},
{'entity': 'properties',
'offsetstart': 11,
'offsetend': 21,
'id': 'c_49874'},
{'entity': 'properties',
'offsetstart': 11,
'offsetend': 21,
'id': 'c_13609'},
{'entity': 'wood',
'offsetstart': 33,
'offsetend': 37,
'id': 'c_8421'}
]
data_groups = defaultdict(list)
for record in data:
record_id = record.pop('id')
record_attrs = tuple(sorted(record.items()))
data_groups[record_attrs].append(record_id)
data_result = []
for a, b in data_groups.items():
data_result.append(dict([*a, ("id", b)]))
print(data_result)
Output (pretty printed)
[
{'entity': 'Mechanical properties', 'offsetend': 21, 'offsetstart': 0, 'id': ['c_4683']},
{'entity': 'properties', 'offsetend': 21, 'offsetstart': 11, 'id': ['c_49874', 'c_13609']},
{'entity': 'wood', 'offsetend': 37, 'offsetstart': 33, 'id': ['c_8421']}
]
Upvotes: 1