Brandon Charette
Brandon Charette

Reputation: 47

Combine elements in list of Tuples?

I'm working on a program that takes in an imdb text file, and outputs the top actors (by movie appearances) based on the user input N.

However, I'm running into an issue where I'm having slots taken up by actors in the same amount of movies, which I need to avoid. Rather, if two actors are in 5 movies, for example, the number 5 should appear and the actors names should be combined , separated by a semicolon.

I've tried multiple workarounds to this and nothing has yet worked. Any suggestions?

if __name__ == "__main__":
    imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip()
    print imdb_file
    N= input('Enter the number of top individuals ==> ')
    print N


    actors_to_movies = {}

    for line in open(imdb_file):
        words = line.strip().split('|')
        actor = words[0].strip()
        movie = words[1].strip()
        if not actor in actors_to_movies:
            actors_to_movies[actor] = set()
        actors_to_movies[actor].add(movie)

    movie_list= sorted(list(actors_to_movies[actor])) 

    #Arranges Dictionary into List of Tuples#
    D = [ (x, actors_to_movies[x]) for x in actors_to_movies]
    descending = sorted(D, key = lambda x: len(x[1]), reverse=True)

    #Prints Tuples in Descending Order N number of times (User Input)#
    for i in range(N):
        print str(len(descending[i][1]))+':', descending[i][0]

Upvotes: 0

Views: 237

Answers (1)

Dmitry Ermolov
Dmitry Ermolov

Reputation: 2237

There is a useful method itertools.groupby

It allows you to split list into the groups by some key. Using it you can easily write a function that prints top actors:

import itertools
def print_top_actors(actor_info_list, top=5):
    """
    :param: actor_info_list should contain tuples of (actor_name, movie_count)
    """
    actor_info_list.sort(key=lambda x: x[1], reverse=True)
    for i, (movie_count, actor_iter) in enumerate(itertools.groupby(actor_info_list)):
        if i >= top:
            break
        print movie_count, ';'.join(actor for actor, movie_count in actor_iter)

and example of usage:

>>> print_top_actors(
...     [
...         ("DiCaprio", 100500),
...         ("Pitt", 100500),
...         ("foo", 10),
...         ("bar", 10),
...         ("baz", 10),
...         ("qux", 3),
...         ("lol", 1)
...     ], top = 3)
100500 DiCaprio;Pitt
10 foo;bar;baz
3 qux

Upvotes: 3

Related Questions