ngInit
ngInit

Reputation: 3

Issue using RegEx with Linux find

I'm writing a script to find and list files on my video drive that aren't already .mkv format, as well as listing any multi-episode files so that I can eventually convert and split these files properly.

Examples of files that should match:

Path/to/FilE332.1/Series Title/Season 01/Series - S01E03 - Episode Name Bluray-2160p.mkv
/Series - S01E103 - Episode Name WEBDL-1080p.mkv

Examples of files that shouldn't match:

Path/to/FilE332.1/Series Title/Season 01/Series - S01E04E05 - Episode Name SDTV.mkv
/Series - S01E04E05 - Episode Name SDTV.mkv

Here's the command I came up with:

find /path/to/files -type f ! -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"

This regex seems to be working properly when tested on regex101's website, so I'm pretty confident that the regex string is correct: https://regex101.com/r/iyUbh6/1

I've tried adding the -regextype flag to no avail:

find /path/to/files -type f ! -regextype posix-egrep -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype posix-basic -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype egrep -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"

I also read some stuff about \d not working properly, so I tried changing it to [[:digit:]]. That didn't work either.

find /path/to/files -type f ! -regextype posix-basic -regex ".*- S[[:digit:]]{2}E(?:[[:digit:]]{3}|[[:digit:]]{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype posix-extended -regex ".*- S[[:digit:]]{2}E([[:digit:]]{3}|[[:digit:]]{2}) -.*\.mkv"

I don't really know where to go from here, so hopefully someone with more experience has some insight on this issue.

Upvotes: 0

Views: 112

Answers (2)

Bohemian
Bohemian

Reputation: 425053

I just pipe find to grep -v to do the filtering out:

find path -type f | grep -v \.mkv 

Upvotes: 0

Shawn
Shawn

Reputation: 52419

Note: The following assumes you're using GNU find, which since you mention Linux, is a safe bet.

The default regular expression syntax does not understand \d (Instead you'd use [0-9] or [[:digit:]]). Alternation is \|. I don't think it supports repetition ranges; they're not documented. POSIX Basic Regular Expression syntax also doesn't understand \d, or alternation (though some GNU implementations do as an extension using \|), and requires many other things like groups and repetition ranges to be escaped. And none of the supported flavors supports non-capturing grouping ((?:...)).

Since your alternating group tests for either two or three digits, it can be turned into a single range when using one of the RE flavors that supports them.

So, something like:

find /path/to/files -regextype posix-extended -type f ! -regex ".*- S[0-9]{2}E[0-9]{2,3} -.*\.mkv"

is probably the cleanest approach.

Upvotes: 1

Related Questions