Jorman Franzini
Jorman Franzini

Reputation: 339

Exclude/include regex patterns

I want to try to make some more types of regex, so I have been trying to make the following work.

Here is my expression: https://regex101.com/r/VzspFy/4/

On the test strings, the very first 3 are good, so patterns like that must be matched, the problem is the last one, which I don't want to be included, so I tried to do this:

https://regex101.com/r/9HVKTK/2

and this:

https://regex101.com/r/9HVKTK/1

But no luck!

The main idea is:

`aaa ... bbb ccc` -> must match
`ccc ... (aaa|ddd|eee) ... bbb ccc` -> should not match

How can I make it work or maybe some better implementation?

Upvotes: 0

Views: 155

Answers (2)

Robo Mop
Robo Mop

Reputation: 3553

Here's a relatively simple regex for your problem:

(?:(?<=[-]\s)(?:ITA\s)?\w{3}\s\w{3}\s[-]\s\w{3}\s\w{3}\s\w{3}\b)|(?:Eng\.sub\.ita)

which you can test out here.

REGEX:

(?<=[-]\s) is a positive look-behind, that makes sure that the match is preceded by a dash and a space (but doesn't match them)

(?:ITA\s)? is a non-capturing group which tells the regex that if the match is preceded by an "ITA" and space, then match them also.

\w{3} matches a string of three word characters (letters/numbers/underscore or a combination of them)

\s means a single space, and

[-] is just a fancy way of matching a single -.

|(?:Eng\.sub\.ita) tells the regex to match eng.sub.ita (case-insensitive) along with the original matches if present in a sentence together.

Do Note:

If the name of the show contains something along the lines of - red SEO - two one or 'dash-space-three_letters-space-three_letters-space-dash-space-three_letters-space-three_letters', then even the name of the show will be matched.

However, the likelihood of a show containing such a format is negligible, so you needn't worry about that.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627537

You may use

var rx = new Regex(@"(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\b(eng(?:\W+\w+)?\W+sub\W+ita)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);

See the regex demo. You need to get Group 1 values.

Pattern details

  • (?:^|]) - either start of string or ] (add | RegexOptions.Multiline if you have a multiline string as input, but I suppose these are all standalone strings)
  • (?:(?!\b(?:eng|ita)\b)[^]])* - any char but ], as many as possible, that does not start a whole word eng or ita (see tempered greedy token to understand this construct better)
  • \b - a word boundary
  • (eng(?:\W+\w+)?\W+sub\W+ita) - Group 1:
    • eng - a literal substring
    • (?:\W+\w+)? - an optional sequence of any 1+ non-word chars followed with 1+ word chars (actually, an optional word)
    • \W+ - 1+ non-word chars
    • sub - a literal substring
    • \W+ - 1+ non-word chars
    • ita - a literal substring
  • \b - a word boundary

See the C# demo:

var strs = new List<string> { 
        "Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
        "Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
        "Lucifer S03e01-08 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni SEASON PREMIERE",
        "Young Sheldon S01e13 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS",
        "Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus",
        "Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus",
        "Young Sheldon S01e14 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS",
        "Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
        "Lucifer S03e16 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
        "Lucifer S02e01-13 [XviD - Eng Mp3 - Sub Ita] DLRip by Pir8 [CURA] Fede e Religioni FULL ",
        "Absentia S01e01-10 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] By Morpheus The.Breadwinner.2017.ENG.Sub.ITA.HDRip.XviD-[WEB]"
    };
var rx = new Regex(@"(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\b(eng(?:\W+\w+)?\W+sub\W+ita)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (var s in strs)
{
    Console.WriteLine(s);
    var result = rx.Match(s);
    if (result.Success)
        Console.WriteLine("Matched: {0}", result.Groups[1].Value);
    else
        Console.WriteLine("No match!");
    Console.WriteLine("==========================================");
}

Output:

Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e01-08 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni SEASON PREMIERE
Matched: Eng Mp3 - Sub Ita
==========================================
Young Sheldon S01e13 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS
Matched: Eng Ac3 - Sub Ita
==========================================
Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus
No match!
==========================================
Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus
No match!
==========================================
Young Sheldon S01e14 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS
Matched: Eng Ac3 - Sub Ita
==========================================
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e16 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S02e01-13 [XviD - Eng Mp3 - Sub Ita] DLRip by Pir8 [CURA] Fede e Religioni FULL 
Matched: Eng Mp3 - Sub Ita
==========================================
Absentia S01e01-10 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] By Morpheus The.Breadwinner.2017.ENG.Sub.ITA.HDRip.XviD-[WEB]
Matched: ENG.Sub.ITA
==========================================

Upvotes: 1

Related Questions