Reputation: 393
I have a list of user:friends (50,000) and a list of event attendees (25,000 events and list of attendees for each event). I want to find top k friends with whom the user goes to the event. This needs to be done for each user.
I tried traversing lists but is computationally very expensive. I am also trying to do it by creating weighted graph.(Python)
Let me know if there is any other approach.
Upvotes: 2
Views: 199
Reputation: 226296
Python's collection objects (dictionaries, sets, and collections.Counter) make short work of this task:
from collections import Counter
def top_k_friends(friends, events, k=2):
'''Given a dictionary users mapped to their set of friends
and a dictionary of events mapped to a set of their attendees,
find the top k friends with whom the user goes to the event.
Do this for each user.
'''
for user, users_friends in friends.iteritems():
c = Counter()
for event, attendees in events.iteritems():
if user in attendees:
c.update(users_friends.intersection(attendees))
print user, '-->', c.most_common(k)
if __name__ == '__main__':
friends = {
'robert' : {'mary', 'marty', 'maggie', 'john'},
'paul' : {'marty', 'mary', 'amber', 'susan'}
}
events = {
'derby': {'amber', 'mary', 'robert'},
'pageant': {'maggie', 'paul', 'amber', 'marty', 'john'},
'fireworks': {'susan', 'robert', 'marty', 'paul', 'robert'}
}
top_k_friends(friends, events)
Upvotes: 1
Reputation: 8558
I'd give you a code sample if I better understood what your current data structures look like, but this sounds like a job for a pandas dataframe groupby (in case you don't feel like actually using a database as others have suggested).
Upvotes: 0
Reputation: 5842
Can you do something like this.
Im assuming friends of a user is relatively less, and the events attended by a particular user is also much lesser than total number of events.
So have a boolean vector of attended events for each friend of the user.
Doing a dot product and those that have max will be the friend who most likely resembles the user.
Again,.before you do this..you will have to filter some events to keep the size of your vectors manageable.
Upvotes: 0