Reputation: 73
So I have a list as follows:
mylist = ['movie1.mp4','movie2.srt','movie1.srt','movie3.mp4','movie1.mp4']
Note: a simple list for testing, the script will deal with unknown file names and more of them.
So I want to find the movie files with a paired srt file, and put those in a dictionary. Anything left (ie movie3.mp4) will be left in the list and dealt with later.
I've been playing a bit with list comprehension, though it might not leave the leftover data and allow me to construct the dictionary.
import re
matches = [ x for x, a in mylist if (re.sub('\.srt$', '\.mp4$', a ) == x or re.sub('\.srt$', '\.mp4$', a ) == x) ]
This returns:
ValueError: too many values to unpack
Any ideas on how I might approach this?
Upvotes: 0
Views: 1110
Reputation: 6387
I would divide the task into to separate concerns: first build dictionary, grouping files with the same rootname; later check which have both video and subtitle file. (And please don't use regex to split filenames, os.path
does much better here).
from collections import defaultdict
import os
mylist = ['movie1.mp4','movie2.srt','movie1.srt','movie3.mp4','movie1.mp4']
movies = defaultdict(dict)
for filename in mylist:
name, ext = os.path.splitext(filename)
movies[name][ext] = filename
sub_extentions = set(['.txt', '.srt'])
movie_extensions = set(['.mp4', '.avi'])
for name, files in movies.items():
files_set = set(files.keys())
if not files_set & sub_extentions:
continue # no subs
elif not files_set & movie_extensions:
continue # no movie
else:
print name, files.values()
# output: movie1 ['movie1.srt', 'movie1.mp4']
PS. What are you going to do with .mkv
files with enclosed subtitles? ;)
Upvotes: 0
Reputation: 63707
You are adopting a wrong approach to your problem. The easiest would be to determine the basenames of the files using os.path.splitext and group them according to it. A possible approach would be to use itertools.groupby
Implementation
groups = {key: set(value)
for key, value in groupby(sorted(mylist,
key = lambda e: os.path.splitext(e)[0]),
key = lambda e: os.path.splitext(e)[0])}
Example
>>> pprint.pprint(groups)
{'movie1': set(['movie1.mp4', 'movie1.srt']),
'movie2': set(['movie2.srt']),
'movie3': set(['movie3.mp4'])}
Upvotes: 5