John D
John D

Reputation: 41

Return a list of similar authors

I'm trying to write a function that will return a list of items from a key from a key (if that makes sense). For example, here's a dictionary of authors, and similar authors.

authors = {
    'Ray Bradbury': ['Harlan Ellison', 'Robert Heinlein', 'Isaac Asimov', 'Arthur Clarke'],
    'Harlan Ellison': ['Neil Stephenson', 'Kurt Vonnegut', 'Richard Morgan', 'Douglas Adams'],
    'Kurt Vonnegut': ['Terry Pratchett', 'Tom Robbins', 'Douglas Adams', 'Neil Stephenson', 'Jeff Vandemeer'],
    'Thomas Pynchon': ['Isaac Asimov', 'Jorges Borges', 'Robert Heinlein'],
    'Isaac Asimov': ['Stephen Baxter', 'Ray Bradbury', 'Arthur Clarke', 'Kurt Vonnegut', 'Neil Stephenson'],
    'Douglas Adams': ['Terry Pratchett', 'Chris Moore', 'Kurt Vonnegut']
}

And the function I came up with is this:

def get_similar(author_list, author):
    for item in author_list[author]:
        return author_list[author]

Which only returns the items for the first key. I'd like it to return all of the similar authors, like this:

get_similar(authors, 'Harlan Ellison')

['Terry Pratchett', 'Tom Robbins', 'Douglas Adams', 'Neil Stephenson', 
 'Jeff Vandemeer','Terry Pratchett', 'Chris Moore', 'Kurt Vonnegut']

Where it finds the key given (author), looks at the items listed for that key, and then returns those key's items. In this case Harlan Ellison has four authors listed - Neil Stephenson, Kurt Vonnegut, Richard Morgan, and Douglas Adams. The function then looks up those authors, and returns the items listed for them - Kurt Vonnegut returns Terry Pratchett, Tom Robbins, Douglas Adams, Neil Stephenson, and Jeff Vandemeer, and Douglas Adams returns Terry Pratchett, Chris Moore, and Kurt Vonnegut,

Duplicates are fine, and I'd like it in alphabetical order (I assume you could just use a sort command at the end) Any help would be much appreciated, I'm stumped!

Upvotes: 4

Views: 147

Answers (8)

neehari
neehari

Reputation: 2612

I would not include parameter author in the output if that's one of the elements in a list value. You could use list comprehension:

def get_similar(author_list, author):
    # Lists of similar authors
    similar = [author_list[auth] for auth in author_list[author] if auth in author_list]

    # Merge the lists and sort the authors. Do not include parameter author
    return sorted(auth for sub in similar for auth in sub if auth != author)



authors = {
    'Ray Bradbury': ['Harlan Ellison', 'Robert Heinlein', 'Isaac Asimov', 'Arthur Clarke'],
    'Harlan Ellison': ['Neil Stephenson', 'Kurt Vonnegut', 'Richard Morgan', 'Douglas Adams'],
    'Kurt Vonnegut': ['Terry Pratchett', 'Tom Robbins', 'Douglas Adams', 'Neil Stephenson', 'Jeff Vandemeer'],
    'Thomas Pynchon': ['Isaac Asimov', 'Jorges Borges', 'Robert Heinlein'],
    'Isaac Asimov': ['Stephen Baxter', 'Ray Bradbury', 'Arthur Clarke', 'Kurt Vonnegut', 'Neil Stephenson'],
    'Douglas Adams': ['Terry Pratchett', 'Chris Moore', 'Kurt Vonnegut']
}


>>> get_similar(authors, 'Harlan Ellison')
['Chris Moore', 'Douglas Adams', 'Jeff Vandemeer', 'Kurt Vonnegut', 'Neil Stephenson', 'Terry Pratchett', 'Terry Pratchett', 'Tom Robbins']

>>> get_similar(authors, 'Ray Bradbury')  # There's 'Ray Bradbury' in the values of 'Isaac Asimov'
['Arthur Clarke', 'Douglas Adams', 'Kurt Vonnegut', 'Kurt Vonnegut', 'Neil Stephenson', 'Neil Stephenson', 'Richard Morgan', 'Stephen Baxter']

Upvotes: 0

Transhuman
Transhuman

Reputation: 3547

One way is using list comprehension + itertools.chain

from itertools import chain

def get_similar(author_list, author):
     return sorted(set(chain(*[v for k,v in authors.items() if k in authors[author]])))

get_similar(authors, 'Harlan Ellison')
#['Chris Moore', 'Douglas Adams', 'Jeff Vandemeer', 'Kurt Vonnegut', 'Neil Stephenson', 'Terry Pratchett', 'Tom Robbins']

Upvotes: 0

zwer
zwer

Reputation: 25789

What you're doing now will work the same way without the for loop - you're essentially just doing a single lookup and return that, hence you get only one entry. What you need to do instead is to do your lookup, find the authors and then do a lookup for each of those authors, then rinse and repeat... The easiest way to do that is to use a bit of recursion:

def get_similar(authors, author):
    return [a for x in authors.pop(author, []) for a in [x] + get_similar(authors, x)]

get_similar(authors, 'Harlan Ellison')

# ['Neil Stephenson', 'Kurt Vonnegut', 'Terry Pratchett', 'Tom Robbins', 'Douglas Adams',
#  'Terry Pratchett', 'Chris Moore', 'Kurt Vonnegut', 'Neil Stephenson', 'Jeff Vandemeer',
#  'Richard Morgan', 'Douglas Adams']

Then all you need to do is to turn it into a set to get rid of the duplicates and then sort it, or if you don't mind a slight performance hit (due to recursion) you can do it right inside your function:

def get_similar(authors, author):
    return sorted(set([a for x in authors.pop(author, []) for a in [x] + get_similar(authors, x)]))

# ['Chris Moore', 'Douglas Adams', 'Jeff Vandemeer', 'Kurt Vonnegut', 'Neil Stephenson', 'Richard Morgan', 'Terry Pratchett', 'Tom Robbins']

Keep in mind that this modifies your input dictionary to avoid infinite recursion, so if you want to keep your authors dictionary intact call the function as get_similar(authors.copy(), author).

Upvotes: 1

ekhumoro
ekhumoro

Reputation: 120598

Here's a simple solution using a set and list comprehension:

def get_similar(author_list, author):
    similar = set(author_list.get(author, []))
    similar.update(*[author_list.get(item, []) for item in similar])
    return sorted(similar)

get_similar(authors, 'Harlan Ellison')

Output:

['Chris Moore', 'Douglas Adams', 'Jeff Vandemeer', 'Kurt Vonnegut',
 'Neil Stephenson', 'Richard Morgan', 'Terry Pratchett', 'Tom Robbins']

Upvotes: 1

bunji
bunji

Reputation: 5213

You are very close but instead of returning after finding the first list of similar authors, you should store all of the authors you find in a list and then return them all after your for loop has finished:

def get_similar(author_list, author):
    similar_authors = []
    for item in author_list[author]:
        if item in author_list:
            similar_authors.extend(author_list[item])
    return similar_authors

Notice that I also added an if statement to make sure that the item is in fact one of the keys in your dictionary so you don't get an error later on (for example: 'Neil Stephenson' is in the dictionary as a member of one of the values but is not a key).

EXTRA INFO:

(if you are interested)

Another option is to turn your function into a generator instead. This has the advantage of not having to store all the similar authors in a list and instead yields each author as it is found:

def get_similar2(author_list, author):
    for item in author_list[author]:
        if item in author_list:
            for other_author in author_list[item]:
                yield other_author 

Or if you are using python 3.3+ you can simplify this a bit by using the yield from expression to get functionally the same code as in get_similar2:

def get_similar3(author_list, author):
    for item in author_list[author]:
        if item in author_list:
            yield from author_list[item]

All three of the functions/generators above will give you the same results (just remember to get all the values yielded from the generators):

print(get_similar(authors, 'Harlan Ellison'))
['Terry Pratchett', 'Tom Robbins', 'Douglas Adams', 'Neil Stephenson', 'Jeff Vandemeer', 'Terry Pratchett', 'Chris Moore', 'Kurt Vonnegut']

print(list(get_similar2(authors, 'Harlan Ellison')))
['Terry Pratchett', 'Tom Robbins', 'Douglas Adams', 'Neil Stephenson', 'Jeff Vandemeer', 'Terry Pratchett', 'Chris Moore', 'Kurt Vonnegut']

print(list(get_similar3(authors, 'Harlan Ellison')))
['Terry Pratchett', 'Tom Robbins', 'Douglas Adams', 'Neil Stephenson', 'Jeff Vandemeer', 'Terry Pratchett', 'Chris Moore', 'Kurt Vonnegut']

Upvotes: 1

Neil
Neil

Reputation: 14313

I'd use recursion to find similar authors in this fashion. Come to find out, it is even more inconvenient (and dangerous and slower) to want to return duplicates.

authors = {'Ray Bradbury': ['Harlan Ellison', 'Robert Heinlein', 'Isaac Asimov', 'Arthur Clarke'], 'Harlan Ellison': ['Neil Stephenson', 
           'Kurt Vonnegut', 'Richard Morgan', 'Douglas Adams'], 'Kurt Vonnegut': ['Terry Pratchett', 'Tom Robbins', 'Douglas Adams', 
           'Neil Stephenson', 'Jeff Vandemeer'], 'Thomas Pynchon': ['Isaac Asimov', 'Jorges Borges', 'Robert Heinlein'], 'Isaac Asimov': 
           ['Stephen Baxter', 'Ray Bradbury', 'Arthur Clarke', 'Kurt Vonnegut', 'Neil Stephenson'], 'Douglas Adams': ['Terry Pratchett', 'Chris Moore', 'Kurt Vonnegut']}

def get_similar(author_list, author, currentList=[]):
    for similar in author_list[author]:
        if similar not in currentList:
            currentList.append(similar)
            if similar in authors:
                get_similar(author_list, author, currentList)
    return sorted(currentList)

print(get_similar(authors, "Harlan Ellison"))

Returns:

['Douglas Adams', 'Kurt Vonnegut', 'Neil Stephenson', 'Richard Morgan']

Upvotes: 0

Martin
Martin

Reputation: 1113

I think this is what you are looking for. Hopefully it gets you going.

authors = {'Ray Bradbury': ['Harlan Ellison', 'Robert Heinlein', 'Isaac Asimov', 'Arthur Clarke'], 'Harlan Ellison': ['Neil Stephenson', 'Kurt Vonnegut', 'Richard Morgan', 'Douglas Adams'], 'Kurt Vonnegut': ['Terry Pratchett', 'Tom Robbins', 'Douglas Adams', 'Neil Stephenson', 'Jeff Vandemeer'], 'Thomas Pynchon': ['Isaac Asimov', 'Jorges Borges', 'Robert Heinlein'], 'Isaac Asimov': ['Stephen Baxter', 'Ray Bradbury', 'Arthur Clarke', 'Kurt  Vonnegut', 'Neil Stephenson'], 'Douglas Adams': ['Terry  Pratchett', 'Chris Moore', 'Kurt Vonnegut']}


def get_similar(authors, author):
    retVal = []
    for k, v in authors.items():
        if k == author:
            for value in v:
                retVal.append(value)
                if value in authors:
                    for v2 in authors[value]:
                       retVal.append(v2)
return sorted(retVal)

get_similar(authors, "Harlan Ellison") returns ['Chris Moore', 'Douglas Adams', 'Douglas Adams', 'Jeff Vandemeer', 'Kurt Vonnegut', 'Kurt Vonnegut', 'Neil Stephenson', 'Neil Stephenson', 'Richard Morgan', 'Terry Pratchett', 'Terry Pratchett', 'Tom Robbins']

I'll leave it to you to figure out how to remove the duplicates.

Upvotes: 1

Shailyn Ortiz
Shailyn Ortiz

Reputation: 766

What is happening is that functions only accept one return to fix this, return the full row without iterating

def get_similar(author_list, author):
     return sorted(author_list[author])

Upvotes: 0

Related Questions