Amistad
Amistad

Reputation: 7400

Merging 2 list of dicts based on common values

So I have 2 list of dicts which are as follows:

list1 = [
{'name':'john',
'gender':'male',
'grade': 'third'
},
{'name':'cathy',
'gender':'female',
'grade':'second'
},
]

list2 = [
{'name':'john',
'physics':95,
'chemistry':89
},
{'name':'cathy',
'physics':78,
'chemistry':69
},
]

The output list i need is as follows:

final_list = [
{'name':'john',
'gender':'male',
'grade':'third'
'marks': {'physics':95, 'chemistry': 89}
},
{'name':'cathy',
'gender':'female'
'grade':'second'
'marks': {'physics':78, 'chemistry': 69}
},
]

First i tried with iteration as follows:

final_list = []
for item1 in list1:
    for item2 in list2:
        if item1['name'] == item2['name']:
            temp = dict(item_2)
            temp.pop('name')
            final_result.append(dict(name=item_1['name'], **temp))

However,this does not give me the desired result..I also tried pandas..limited experience there..

>>> import pandas as pd
>>> df1 = pd.DataFrame(list1)
>>> df2 = pd.DataFrame(list2)
>>> result = pd.merge(df1, df2, on=['name'])

However,i am clueless how to get the data back to the original format i need it in..Any help

Upvotes: 1

Views: 80

Answers (4)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

Considering you want a list of dicts as output, you can easily do what you want without pandas, use a dict to store all the info using the names as the outer keys, doing one pass over each list not like the O(n^2) double loops in your own code:

out = {d["name"]: d for d in list1}
for d in list2:
    out[d.pop("name")]["marks"] = d


from pprint import pprint as pp

pp(list(out.values()))

Output:

[{'gender': 'female',
  'grade': 'second',
  'marks': {'chemistry': 69, 'physics': 78},
  'name': 'cathy'},
 {'gender': 'male',
  'grade': 'third',
  'marks': {'chemistry': 89, 'physics': 95},
  'name': 'john'}]

That reuses the dicts in your lists, if you wanted to create new dicts:

out = {d["name"]: d.copy() for d in list1}

for d in list2:
    k = d.pop("name")
    out[k]["marks"] = d.copy()

from pprint import pprint as pp

pp(list(out.values()))

The output is the same:

[{'gender': 'female',
  'grade': 'second',
  'marks': {'chemistry': 69, 'physics': 78},
  'name': 'cathy'},
 {'gender': 'male',
  'grade': 'third',
  'marks': {'chemistry': 89, 'physics': 95},
  'name': 'john'}]

Upvotes: 1

Nader Hisham
Nader Hisham

Reputation: 5414

create a function that will add a marks column , this columns should contain a dictionary of physics and chemistry marks

def create_marks(df):
    df['marks'] = { 'chemistry' : df['chemistry'] , 'physics' : df['physics'] }
    return df

result_with_marks = result.apply( create_marks , axis = 1)

Out[19]:
gender  grade   name    chemistry   physics            marks
male    third   john    89             95   {u'chemistry': 89, u'physics': 95}
female  second  cathy   69             78   {u'chemistry': 69, u'physics': 78}

then convert it to your desired result as follows

result_with_marks.drop( ['chemistry' , 'physics'], axis = 1).to_dict(orient = 'records')

Out[20]:
[{'gender': 'male',
  'grade': 'third',
  'marks': {'chemistry': 89L, 'physics': 95L},
  'name': 'john'},
 {'gender': 'female',
  'grade': 'second',
  'marks': {'chemistry': 69L, 'physics': 78L},
  'name': 'cathy'}]

Upvotes: 1

Zero
Zero

Reputation: 76917

You can first merge both dataframes

In [144]: df = pd.DataFrame(list1).merge(pd.DataFrame(list2))

Which would look like,

In [145]: df
Out[145]:
   gender   grade   name  chemistry  physics
0    male   third   john         89       95
1  female  second  cathy         69       78

Then create a marks columns as a dict

In [146]: df['marks'] = df.apply(lambda x: [x[['chemistry', 'physics']].to_dict()], axis=1)

In [147]: df
Out[147]:
   gender   grade   name  chemistry  physics  \
0    male   third   john         89       95
1  female  second  cathy         69       78

                                  marks
0  [{u'chemistry': 89, u'physics': 95}]
1  [{u'chemistry': 69, u'physics': 78}]

And, use to_dict(orient='records') method of selected columns of dataframe

In [148]: df[['name', 'gender', 'grade', 'marks']].to_dict(orient='records')
Out[148]:
[{'gender': 'male',
  'grade': 'third',
  'marks': [{'chemistry': 89L, 'physics': 95L}],
  'name': 'john'},
 {'gender': 'female',
  'grade': 'second',
  'marks': [{'chemistry': 69L, 'physics': 78L}],
  'name': 'cathy'}]

Upvotes: 3

tzaman
tzaman

Reputation: 47780

Using your pandas approach, you can call

result.to_dict(orient='records')

to get it back as a list of dictionaries. It won't put marks in as a sub-field though, since there's nothing telling it to do that. physics and chemistry will just be fields on the same level as the rest.

You may also be having problems because your name is 'cathy' in the first list and 'kathy' in the second, which naturally won't get merged.

Upvotes: 1

Related Questions