Reputation: 3
I'm writing a script to find and list files on my video drive that aren't already .mkv format, as well as listing any multi-episode files so that I can eventually convert and split these files properly.
Examples of files that should match:
Path/to/FilE332.1/Series Title/Season 01/Series - S01E03 - Episode Name Bluray-2160p.mkv
/Series - S01E103 - Episode Name WEBDL-1080p.mkv
Examples of files that shouldn't match:
Path/to/FilE332.1/Series Title/Season 01/Series - S01E04E05 - Episode Name SDTV.mkv
/Series - S01E04E05 - Episode Name SDTV.mkv
Here's the command I came up with:
find /path/to/files -type f ! -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"
This regex seems to be working properly when tested on regex101's website, so I'm pretty confident that the regex string is correct: https://regex101.com/r/iyUbh6/1
I've tried adding the -regextype flag to no avail:
find /path/to/files -type f ! -regextype posix-egrep -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype posix-basic -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype egrep -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"
I also read some stuff about \d not working properly, so I tried changing it to [[:digit:]]. That didn't work either.
find /path/to/files -type f ! -regextype posix-basic -regex ".*- S[[:digit:]]{2}E(?:[[:digit:]]{3}|[[:digit:]]{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype posix-extended -regex ".*- S[[:digit:]]{2}E([[:digit:]]{3}|[[:digit:]]{2}) -.*\.mkv"
I don't really know where to go from here, so hopefully someone with more experience has some insight on this issue.
Upvotes: 0
Views: 112
Reputation: 425053
I just pipe find
to grep -v
to do the filtering out:
find path -type f | grep -v \.mkv
Upvotes: 0
Reputation: 52419
Note: The following assumes you're using GNU find, which since you mention Linux, is a safe bet.
The default regular expression syntax does not understand \d
(Instead you'd use [0-9]
or [[:digit:]]
). Alternation is \|
. I don't think it supports repetition ranges; they're not documented. POSIX Basic Regular Expression syntax also doesn't understand \d
, or alternation (though some GNU implementations do as an extension using \|
), and requires many other things like groups and repetition ranges to be escaped. And none of the supported flavors supports non-capturing grouping ((?:...)
).
Since your alternating group tests for either two or three digits, it can be turned into a single range when using one of the RE flavors that supports them.
So, something like:
find /path/to/files -regextype posix-extended -type f ! -regex ".*- S[0-9]{2}E[0-9]{2,3} -.*\.mkv"
is probably the cleanest approach.
Upvotes: 1