Doof
Doof

Reputation: 143

Giving a composite score for merged dictionary keys

I am working on a project that needs to say that a certain ID is most likely. Let me explain using example. I have 3 dictionaries that contain ID's and their score

Ex: d1 = {74701: 3, 90883: 2}

I assign percentage score like this,

d1_p = {74701: 60.0, 90883: 40.0} , here the score is the (value of key in d1)/(total sum of values)

Similarly i have 2 other dictionaries

d2 = {90883: 2, 74701: 2} , d2_p = {90883.0: 50.0, 74701.0: 50.0}

d3 = {75853: 2}, d3_p = {75853: 100.0}

My task is to give a composite score for each ID from the above 3 dictionaries a decide a winner by taking the highest score. How would i mathematically assign a composite score between 0-100 for each of these ID's??

Ex: in above case 74701 needs to be the clear winner.

I tried giving average, but it fails, because I need to give more preference for the ID's that occur in multiple dictionaries. Ex: lets say 74701 was majority in d1 and d2 with 30,40 values. then its average will be (30+40+0)/3 = 23.33 , while 75853 which occurs only once with 100% will get (100+0+0)/3 = 33.33 and it will be given as winner, which is wrong.

Hence can somone suggest a good mathematical way in python with maybe code to give such score and decide majority?

Upvotes: 0

Views: 78

Answers (2)

luizbarcelos
luizbarcelos

Reputation: 796

Instead of trying to create a global score from different dictionaries, since your main goal is to analyze frequency I would suggest to summarize all the data into a single dictionary, which is less error prone in general. Say I have 3 dictionaries:

a = {1: 2, 2: 3}
b = {2: 4, 3: 5}
c = {3: 4, 4: 9}

You could summarize these three dictionaries into one by summing the values for each key:

result = {1: 2, 2: 7, 3: 9, 4: 9}

That could be easily achieved by using Counter:

from collections import Counter

result = Counter(a)
result.update(Counter(b))
result.update(Counter(c))
result = dict(result)

Which would yield the desired summary. If you want different weights for each dictionary that could also be done in a similar fashion, the takeaway is that you should not be trying to obtain information from the dictionaries as separate entities, but instead merge them together into one statistic.

Upvotes: 1

FMc
FMc

Reputation: 42421

Think of the data in a tabular way: for each game/match/whatever, each ID gets a certain number of points. If you care the most about overall point total for the entire sequences of games (the entire "season", so to speak), then add up the points to determine a winner (and then scale everything down/up to 0 to 100).

      74701   90883   75853
---------------------------
1         3       2       0
2         2       2       0
3         0       0       2
Total     5       4       2

Alternatively, we can express those same scores in percentage terms per game. Again, every ID must be given a value. In this case, we need to average the percentages -- all of them, including the zeros:

      74701   90883   75853
---------------------------
1        .6      .4       0
2        .5      .5       0
3         0       0     100
Avg     .37     .30     .33

Both approaches could make sense, depending on the context. And both also declare 74701 to be the winner, as desired. But notice that they give different results for 2nd and 3rd place. Such differences occur because the two systems prioritize different things. You need to decide which approach you prefer.

Either way, the first step is to organize the data better. It seems more convenient to have all scores or percentages for each ID, so you can do the needed math with them: that sounds like a dict mapping IDs to lists of scores or percentages.

# Put the data into one collection.
d1 = {74701: 3, 90883: 2}
d2 = {90883: 2, 74701: 2}
d3 = {75853: 2}
raw_scores = [d1, d2, d3]

# Find all IDs.
ids = tuple(set(i for d in raw_scores for i in d))

# Total points/scores for each ID.
points = {
    i : [d.get(i, 0) for d in raw_scores]
    for i in ids
}

# If needed, use that dict to create a similar dict for percentages. Or you
# could create a dict with the same structure holding *both* point totals and
# percentages. Just depends on the approach you pick.
pcts = {}
for i, scores in points.items():
    tot = sum(scores)
    pcts[i] = [sc / tot for sc in scores]

Upvotes: 0

Related Questions