Reputation: 13
I have a data file. It is a csv file. I have created a dictionary like this from it: {movie_id: ('title', ['genres']}. I want to know how to remove the empty strings that come about in the list of genres within the tuple within the dictionary
The data file(.csv) is like this:
movie_id title genres 68735 Warcraft Action Adventure Comedy 124057 Kids at the round table
def read_movies(movie_file: TextIO) -> MovieDict:
"""Return a dictionary containing movie id to (movie name, movie genres)
in the movie_file.
"""
line = movie_file.readline()
while line == '':
line = movie_file.readline()
reader = csv.reader(movie_file)
movie_dict = {int(rows[0]): (rows[1], rows[4:]) for rows in reader}
return movie_dict
I expect the output when movies_dict is called to be:
{68735: ('Warcraft', ['Action', 'Adventure', 'Fantasy']), 293660: ('Deadpool', ['Action', 'Adventure', 'Comedy']), 302156: ('Criminal', ['Action']), 124057: ('Kids of the Round Table', [])}
What I get with my code:
{68735: ('Warcraft', ['Action', 'Adventure', 'Fantasy']), 293660: ('Deadpool', ['Action', 'Adventure', 'Comedy']), 302156: ('Criminal', ['Action', '', '']), 124057: ('Kids of the Round Table', ['', '', ''])}
Upvotes: 0
Views: 126
Reputation: 674
dictionary = {}
dictionary['a']= ('name',['','p','q','',''])
for key in dictionary.keys():
x,y = dictionary[key]
print(x,y)
dictionary[key] =(x, [s for s in y if len(s)!=0])
Upvotes: 0
Reputation: 3786
The easiest way to go would be to filter the empty strings out:
non_empty = lambda s: len(s) > 0
movie_dict = {int(rows[0]): (rows[1], list(filter(non_empty, rows[4:]))) for rows in reader}
non_empty
is an anonymous function determining a string (or really anything which we can call len
for) isn't empty. It returns True
for non-empty strings and False
for empty ones.
By passing it to filter
among rows[4:]
we get a copy of rows[4:]
with only the values which returned True
, hence the non-empty ones.
You could as well use list comprehension to filter out the empty strings: [s for s in rows[4:] if len(s) > 0]
will do the exact same thing.
Both ways, the second item in your tuple is a list filtered for non-empty strings.
Upvotes: 1
Reputation: 1272
It's not clear how your file looks like, how big and why do you want to parse it this way and not using Pandas (for example).
But answering your question. You can achieve this in your code this way
by replacing this line
movie_dict = {int(rows[0]): (rows[1], rows[4:]) for rows in reader}
by
movie_dict = {int(rows[0]): (rows[1], [e for e in rows[4:] if e != '']) for rows in reader}
Upvotes: 2