Mirimari
Mirimari

Reputation: 115

compare dictionary of items within dictionary

I have been working on this problem it seems like a very long time. I have a dictionary that looks like this:

{'1': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,'The Night Listener': 3.0}, '2': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,'You, Me and Dupree': 3.5},'3': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,'Superman Returns': 3.5, 'The Night Listener': 4.0}}

And actually the thing is a lot bigger, but what I am trying to find is the list or set with the pair of ids that have at least 2 movies in common with each other. But something must be wrong because the first key must check with the second one, then the first key with the third one,until the keys runs out, then the second key with the third key and so on until I have no more keys. Then it is the turn of the third key.

Finally I want to get only the keys that have at least 2 movies in common.

I tried doing this:

def sim_critics(movies):
    similarRaters=set()

    first=1
    lastCritic= ''

    movie_over = collections.defaultdict(list)
    movCount=Counter(movie  for v in movies.values() for movie in v)

    for num in movies:
        for movie, _ in movies[num].items():
            movie_over[movie].append(num)


    for critic,_ in movie_over.items():
        if first!=1:
            critic_List = collections.Counter(movie_over[critic])
            critic2_list = collections.Counter(movie_over[lastCritic])
            overlap = list((critic_List & critic2_list).elements())

            if len(overlap) >= 2:
                key = critic + " and " + lastCritic 
                similarRaters.add(key)  
        lastCritic= critic
        first=2  
    return similarRaters

Upvotes: 1

Views: 84

Answers (1)

James
James

Reputation: 2795

A simple solution would be to do this:

def simCritics(movies):
    matchingDicts = set()
    for m in movies:
        for i in movies:
            if (len(m) + len(i)) > len(set(m).union(i)):
                matchingDicts.add((m, i))

    myList = [i for i in list(matchingDicts) if i[0] != i[1]]

    myL = []
    for i in myList:
        if (i[1], i[0]) in myL:
            continue
        myL.append(i) 
    return myL

The comparison in the middle (the one that compares the len) is crucial because if movies overlap, they will have at least one identical key, so the union (which removes duplicates) will be smaller than the sum.

Upvotes: 1

Related Questions