ryan461
ryan461

Reputation: 73

Python. Iterate over a list of files, finding same filenames but different extensions

So I have a list as follows:

mylist = ['movie1.mp4','movie2.srt','movie1.srt','movie3.mp4','movie1.mp4']

Note: a simple list for testing, the script will deal with unknown file names and more of them.

So I want to find the movie files with a paired srt file, and put those in a dictionary. Anything left (ie movie3.mp4) will be left in the list and dealt with later.

I've been playing a bit with list comprehension, though it might not leave the leftover data and allow me to construct the dictionary.

import re matches = [ x for x, a in mylist if (re.sub('\.srt$', '\.mp4$', a ) == x or re.sub('\.srt$', '\.mp4$', a ) == x) ]

This returns: ValueError: too many values to unpack

Any ideas on how I might approach this?

Upvotes: 0

Views: 1110

Answers (2)

m.wasowski
m.wasowski

Reputation: 6387

I would divide the task into to separate concerns: first build dictionary, grouping files with the same rootname; later check which have both video and subtitle file. (And please don't use regex to split filenames, os.path does much better here).

from collections import defaultdict
import os

mylist = ['movie1.mp4','movie2.srt','movie1.srt','movie3.mp4','movie1.mp4']

movies = defaultdict(dict)
for filename in mylist:
    name, ext = os.path.splitext(filename)
    movies[name][ext] = filename

sub_extentions = set(['.txt', '.srt'])
movie_extensions = set(['.mp4', '.avi'])


for name, files in movies.items():
    files_set = set(files.keys())
    if not files_set & sub_extentions:
        continue # no subs
    elif not files_set & movie_extensions:
        continue # no movie
    else:
        print name, files.values()
# output: movie1 ['movie1.srt', 'movie1.mp4']

PS. What are you going to do with .mkv files with enclosed subtitles? ;)

Upvotes: 0

Abhijit
Abhijit

Reputation: 63707

You are adopting a wrong approach to your problem. The easiest would be to determine the basenames of the files using os.path.splitext and group them according to it. A possible approach would be to use itertools.groupby

Implementation

groups = {key: set(value)
      for key, value in groupby(sorted(mylist,
                                       key = lambda e: os.path.splitext(e)[0]),
                                key = lambda e: os.path.splitext(e)[0])}

Example

>>> pprint.pprint(groups)
{'movie1': set(['movie1.mp4', 'movie1.srt']),
 'movie2': set(['movie2.srt']),
 'movie3': set(['movie3.mp4'])}

Upvotes: 5

Related Questions