Reputation: 115
I have been working on this problem it seems like a very long time. I have a dictionary that looks like this:
{'1': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,'The Night Listener': 3.0}, '2': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,'You, Me and Dupree': 3.5},'3': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,'Superman Returns': 3.5, 'The Night Listener': 4.0}}
And actually the thing is a lot bigger, but what I am trying to find is the list or set with the pair of ids that have at least 2 movies in common with each other. But something must be wrong because the first key must check with the second one, then the first key with the third one,until the keys runs out, then the second key with the third key and so on until I have no more keys. Then it is the turn of the third key.
Finally I want to get only the keys that have at least 2 movies in common.
I tried doing this:
def sim_critics(movies):
similarRaters=set()
first=1
lastCritic= ''
movie_over = collections.defaultdict(list)
movCount=Counter(movie for v in movies.values() for movie in v)
for num in movies:
for movie, _ in movies[num].items():
movie_over[movie].append(num)
for critic,_ in movie_over.items():
if first!=1:
critic_List = collections.Counter(movie_over[critic])
critic2_list = collections.Counter(movie_over[lastCritic])
overlap = list((critic_List & critic2_list).elements())
if len(overlap) >= 2:
key = critic + " and " + lastCritic
similarRaters.add(key)
lastCritic= critic
first=2
return similarRaters
Upvotes: 1
Views: 84
Reputation: 2795
A simple solution would be to do this:
def simCritics(movies):
matchingDicts = set()
for m in movies:
for i in movies:
if (len(m) + len(i)) > len(set(m).union(i)):
matchingDicts.add((m, i))
myList = [i for i in list(matchingDicts) if i[0] != i[1]]
myL = []
for i in myList:
if (i[1], i[0]) in myL:
continue
myL.append(i)
return myL
The comparison in the middle (the one that compares the len) is crucial because if movies overlap, they will have at least one identical key, so the union (which removes duplicates) will be smaller than the sum.
Upvotes: 1