Reputation: 664
I have 2 lists that share information. First, I want to have a unique set of names (e.g.list_person
has repeated name
values); For this I produce a new list of dictionaries. Then, I want to add/append list_pets['pet']
to the correct list_person['pets']
in the new dictionary with unique name values, when the list_pets['person_id']
matches the list_person['id']
.
For clarification here is my code + desired output:
My current code:
list_person = [{'id': 12345, 'name': 'Bobby Bobs', 'pets': ['cat']}, # you see that name values are repeated
{'id': 678910, 'name': 'Bobby Bobs', 'pets': ['zebra']},
{'id': 111213, 'name': 'Lisa Bobs', 'pets': ['horse']},
{'id': 141516, 'name': 'Lisa Bobs', 'pets': ['rabbit']}]
list_pets = [{'id': 'abcd', 'pet': 'shark', 'person_id': 12345}, #Bobby Bobs' pets
{'id': 'efgh', 'pet': 'tiger', 'person_id': 678910}, #Bobby Bobs' pets
{'id': 'ijkl', 'pet': 'elephant', 'person_id': 111213}, #Lisa Bobs' pets
{'id': 'mnopq', 'pet': 'dog', 'person_id': 141516}] #Lisa Bobs' pets
output = []
for person, pet in zip(list_person, list_pets):
t = [temp_dict['name'] for temp_dict in output]
if person['name'] not in t:
output.append(person) # make a new list of dicts with unique name values
for unique_person in output: # if they share ID, add the missing pets.
if person['id'] == pet['person_id']:
unique_person['pets'].append(pet['pet'])
print(output)
Desired output:
desired_out = [{'id': 12345, 'name': 'Bobby Bobs', 'pets': ['cat', 'zebra', 'shark', 'tiger']},
{'id': 111213, 'name': 'Lisa Bobs', 'pets': ['horse', 'rabbit', 'elephant', 'dog']}]
Current output:
[{'id': 12345, 'name': 'Bobby Bobs', 'pets': ['cat', 'shark', 'elephant']}, {'id': 111213, 'name': 'Lisa Bobs', 'pets': ['horse', 'elephant']}]
My current output is not displaying all the correct pets. Why is that; and what advice would one give to me to get closer to the solution?
Upvotes: 3
Views: 79
Reputation: 15962
Here's a non-pandas solution, and it doesn't rely on an order-relation between list_person
(aka 'people') and list_pets
. So I'm not assuming that Bobby's data is the first two entries in both lists.
Initially, output
will be a mapping on names to the person's data, incl pets. And ids
will be maintained to link each person's different IDs - by intentionally using a reference to the data dict and not a copy.
Note that when a person is added to output
, it is done as a deepcopy so that it doesn't affect the original item in list_person
.
import copy
output = {} # dict, not list
ids = {} # needed to match with pets which has person_id
for person in list_person:
if (name := person['name']) in output:
output[name]['pets'].extend(person['pets'])
output[name]['id'].append(person['id'])
ids[person['id']] = output[name] # itentionally a reference, not a copy
else:
output[name] = copy.deepcopy(person) # so that the pet list is created as a copy
output[name]['id'] = [output[person['name']]['id']] # turn id's into a list
ids[person['id']] = output[name] # itentionally a reference, not a copy
for pet in list_pets:
# the values in ids dict can be references to the same object
# so use that to our advantage by directly appending to 'pet' list
ids[pet['person_id']]['pets'].append(pet['pet'])
output
is now:
{'Bobby Bobs': {'id': [12345, 678910],
'name': 'Bobby Bobs',
'pets': ['cat', 'zebra', 'shark', 'tiger']},
'Lisa Bobs': {'id': [111213, 141516],
'name': 'Lisa Bobs',
'pets': ['horse', 'rabbit', 'elephant', 'dog']}
}
Final step to make it a list and only use one id
for each person:
output = list(output.values())
for entry in output:
entry['id'] = entry['id'][0] # just the first id
Final output
:
[{'id': 12345,
'name': 'Bobby Bobs',
'pets': ['cat', 'zebra', 'shark', 'tiger']},
{'id': 111213,
'name': 'Lisa Bobs',
'pets': ['horse', 'rabbit', 'elephant', 'dog']}]
And if you don't mind multiple ids, skip the last step above and leave it at output = list(output.values())
.
Upvotes: 1
Reputation: 2128
import itertools
person_df = pd.DataFrame(list_person)
pets_df = pd.DataFrame(list_pets).drop(columns = ['id'])
joined_df = person_df.merge(pets_df, left_on = ['id'], right_on = ['person_id'])
Joined df:
>>> joined_df
id name pets pet person_id
0 12345 Bobby Bobs [cat, shark] shark 12345
1 678910 Bobby Bobs [zebra, tiger] tiger 678910
2 111213 Lisa Bobs [horse, elephant] elephant 111213
3 141516 Lisa Bobs [rabbit, dog] dog 141516
Now first combine pets and pet columns then groupby on name
joined_df['pets'] = [pets + [pet] for pets, pet in zip(joined_df['pets'], joined_df['pet'])]
final_list = joined_df.groupby('name', as_index = False).agg(
id = ('id', 'first'),
pets = ('pets', lambda x: list(itertools.chain(*x)))
).to_dict('records')
Output:
>>> final_list
[{'name': 'Bobby Bobs', 'id': 12345, 'pets': ['cat', 'shark', 'zebra', 'tiger']},
{'name': 'Lisa Bobs', 'id': 111213, 'pets': ['horse', 'elephant', 'rabbit', 'dog']}]
Upvotes: 1