PParker
PParker

Reputation: 1511

How to get the most pairs out of my pandas dataframe?

I have a dataframe that shows for every row a human pair (name_1 and name_2) together with the corresponding score. The score is a numeric value and represents how well these two people fit together. The higher the score, the better the match between person 1 (name_1) and person 2 (name_2).

As you can see, some names can be found twice or more. Of course, one person can only be matched once. My goal is, to find as many pairs in the dataframe as possible and write each of them into a second dataframe.

The problem that makes me struggle is this:

I think I can get max. 8 pairs out of the dataframe since I have 8 different names in the first column. Unfortunately, the scores for best matches are not clearly separated. One person can match with multiple other persons whereas other persons only can match to one specific person. I am not much interested in the matching-score. I am interested in not loosing any person due to a bad choice of pair combination.

I a looking for a way to find and extract as many pairs of of the dataframe.

This is the dataframe df:

      name_1     name_2  score
27      allen      jolly    1.8
23       anna       rock    2.8
22       anna  christina    1.1
26  christina       rock    2.3
24  christina      allen    1.4
25  christina      jolly    1.4
18      emily       rock    3.7
15      emily  sabastein    3.3
16      emily       anna    2.5
17      emily  christina    2.4
4       jacob      jolly    3.4
1       jacob       rick    2.9
3       jacob      allen    2.4
0       jacob       mary    2.3
2       jacob  christina    2.0
7        mary      jolly    1.7
5        mary       rick    1.4
6        mary  christina    1.3
14       rick       rock    2.8
9        rick  sabastein    2.8
8        rick      emily    2.5
13       rick      jolly    2.3
11       rick  christina    2.1
10       rick       anna    2.0
12       rick      allen    1.5
21  sabastein       rock    3.6
19  sabastein       anna    2.8
20  sabastein  christina    1.9

I think the best matching in terms of total maximum score is:

emely       rock        3.7
jacob       jolly       3.4
sabastein   anna        2.8
rick        allen       1.5
mary        christina   1.3 

I am not absolutely sure if this is also the maximum number of pairs I can get. If you know how to get best pairs (see above) or maximum number of pairs, I would be really happy to see.

Upvotes: 0

Views: 300

Answers (1)

SpghttCd
SpghttCd

Reputation: 10860

EDIT
In the meantime I found a very convenient function to create a graph from a dataframe, but you should rename your column score to weight for this:
The you could simply write:

G = nx.from_pandas_edgelist(df, 'name_1', 'name_2', 'weight')
mate = nx.max_weight_matching(G)

and that's it.
(Rest is still part of our discussion below, how you process the result further on...)


My approach would be

import pandas as pd
import networkx as nx

df['edges'] = df.apply(lambda r: (r.name_1, r.name_2, {'weight': r.score}), axis=1)

G = nx.Graph()

allnames = set(df.loc[:, ['name_1', 'name_2']].values.flatten())

for s in allnames:
    G.add_node(s)
G.add_edges_from(df.edges)

mate = nx.max_weight_matching(G)

Result:

res = pd.DataFrame(list(mate), columns=['name_1', 'name_2'])
res['score'] = res.apply(lambda r: G[r[0]][r[1]]['weight'], axis=1)

print(res)
print(f'\nMatchings: {len(res)}\nTotal Score: {res.score.sum():.1f}')            

#      name_1     name_2  score
#0       rock      emily    3.7                            
#1       rick  christina    2.1                          
#2       mary      jacob    2.3                            
#3  sabastein       anna    2.8                           
#4      jolly      allen    1.8                                             
#Matchings: 5                                                
#Total Score: 12.7      

DocSources:
For setting up the Graph you already had the correct link.
For the maximum_matching function see here https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.algorithms.matching.max_weight_matching.html#networkx.algorithms.matching.max_weight_matching

Upvotes: 1

Related Questions