Kabir Singh
Kabir Singh

Reputation: 13

How to remove empty string in a list within a dictionary?

I have a data file. It is a csv file. I have created a dictionary like this from it: {movie_id: ('title', ['genres']}. I want to know how to remove the empty strings that come about in the list of genres within the tuple within the dictionary

The data file(.csv) is like this:

movie_id title genres 68735 Warcraft Action Adventure Comedy 124057 Kids at the round table

def read_movies(movie_file: TextIO) -> MovieDict:

    """Return a dictionary containing movie id to (movie name, movie genres)
    in the movie_file.
    """

    line = movie_file.readline()
    while line == '':
        line = movie_file.readline()

    reader = csv.reader(movie_file)

    movie_dict = {int(rows[0]): (rows[1], rows[4:]) for rows in reader}

    return movie_dict

I expect the output when movies_dict is called to be:

{68735: ('Warcraft', ['Action', 'Adventure', 'Fantasy']), 293660: ('Deadpool', ['Action', 'Adventure', 'Comedy']), 302156: ('Criminal', ['Action']), 124057: ('Kids of the Round Table', [])}

What I get with my code:

{68735: ('Warcraft', ['Action', 'Adventure', 'Fantasy']), 293660: ('Deadpool', ['Action', 'Adventure', 'Comedy']), 302156: ('Criminal', ['Action', '', '']), 124057: ('Kids of the Round Table', ['', '', ''])}

Upvotes: 0

Views: 126

Answers (3)

Parijat Bhatt
Parijat Bhatt

Reputation: 674

dictionary = {}
dictionary['a']= ('name',['','p','q','',''])
for key in dictionary.keys():
    x,y = dictionary[key]
    print(x,y)
    dictionary[key] =(x, [s for s in y if len(s)!=0])

Upvotes: 0

Neo
Neo

Reputation: 3786

The easiest way to go would be to filter the empty strings out:

non_empty = lambda s: len(s) > 0
movie_dict = {int(rows[0]): (rows[1], list(filter(non_empty, rows[4:]))) for rows in reader}

non_empty is an anonymous function determining a string (or really anything which we can call len for) isn't empty. It returns True for non-empty strings and False for empty ones. By passing it to filter among rows[4:] we get a copy of rows[4:] with only the values which returned True, hence the non-empty ones.

You could as well use list comprehension to filter out the empty strings: [s for s in rows[4:] if len(s) > 0] will do the exact same thing.

Both ways, the second item in your tuple is a list filtered for non-empty strings.

Upvotes: 1

David Sidarous
David Sidarous

Reputation: 1272

It's not clear how your file looks like, how big and why do you want to parse it this way and not using Pandas (for example).

But answering your question. You can achieve this in your code this way

by replacing this line

movie_dict = {int(rows[0]): (rows[1], rows[4:]) for rows in reader}

by

movie_dict = {int(rows[0]): (rows[1], [e for e in rows[4:] if e != '']) for rows in reader}

Upvotes: 2

Related Questions