Reputation: 339
I want to try to make some more types of regex, so I have been trying to make the following work.
Here is my expression: https://regex101.com/r/VzspFy/4/
On the test strings, the very first 3 are good, so patterns like that must be matched, the problem is the last one, which I don't want to be included, so I tried to do this:
https://regex101.com/r/9HVKTK/2
and this:
https://regex101.com/r/9HVKTK/1
But no luck!
The main idea is:
`aaa ... bbb ccc` -> must match
`ccc ... (aaa|ddd|eee) ... bbb ccc` -> should not match
How can I make it work or maybe some better implementation?
Upvotes: 0
Views: 155
Reputation: 3553
Here's a relatively simple regex for your problem:
(?:(?<=[-]\s)(?:ITA\s)?\w{3}\s\w{3}\s[-]\s\w{3}\s\w{3}\s\w{3}\b)|(?:Eng\.sub\.ita)
which you can test out here.
(?<=[-]\s)
is a positive look-behind, that makes sure that the match is preceded by a dash and a space (but doesn't match them)
(?:ITA\s)?
is a non-capturing group which tells the regex that if the match is preceded by an "ITA" and space, then match them also.
\w{3}
matches a string of three word characters (letters/numbers/underscore or a combination of them)
\s
means a single space, and
[-]
is just a fancy way of matching a single -
.
|(?:Eng\.sub\.ita)
tells the regex to match eng.sub.ita
(case-insensitive) along with the original matches if present in a sentence together.
If the name of the show contains something along the lines of - red SEO - two one
or 'dash-space-three_letters-space-three_letters-space-dash-space-three_letters-space-three_letters', then even the name of the show will be matched.
However, the likelihood of a show containing such a format is negligible, so you needn't worry about that.
Upvotes: 0
Reputation: 627537
You may use
var rx = new Regex(@"(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\b(eng(?:\W+\w+)?\W+sub\W+ita)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
See the regex demo. You need to get Group 1 values.
Pattern details
(?:^|])
- either start of string or ]
(add | RegexOptions.Multiline
if you have a multiline string as input, but I suppose these are all standalone strings)(?:(?!\b(?:eng|ita)\b)[^]])*
- any char but ]
, as many as possible, that does not start a whole word eng
or ita
(see tempered greedy token to understand this construct better)\b
- a word boundary(eng(?:\W+\w+)?\W+sub\W+ita)
- Group 1:
eng
- a literal substring(?:\W+\w+)?
- an optional sequence of any 1+ non-word chars followed with 1+ word chars (actually, an optional word)\W+
- 1+ non-word charssub
- a literal substring\W+
- 1+ non-word charsita
- a literal substring\b
- a word boundarySee the C# demo:
var strs = new List<string> {
"Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
"Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
"Lucifer S03e01-08 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni SEASON PREMIERE",
"Young Sheldon S01e13 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS",
"Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus",
"Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus",
"Young Sheldon S01e14 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS",
"Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
"Lucifer S03e16 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
"Lucifer S02e01-13 [XviD - Eng Mp3 - Sub Ita] DLRip by Pir8 [CURA] Fede e Religioni FULL ",
"Absentia S01e01-10 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] By Morpheus The.Breadwinner.2017.ENG.Sub.ITA.HDRip.XviD-[WEB]"
};
var rx = new Regex(@"(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\b(eng(?:\W+\w+)?\W+sub\W+ita)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (var s in strs)
{
Console.WriteLine(s);
var result = rx.Match(s);
if (result.Success)
Console.WriteLine("Matched: {0}", result.Groups[1].Value);
else
Console.WriteLine("No match!");
Console.WriteLine("==========================================");
}
Output:
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e01-08 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni SEASON PREMIERE
Matched: Eng Mp3 - Sub Ita
==========================================
Young Sheldon S01e13 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS
Matched: Eng Ac3 - Sub Ita
==========================================
Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus
No match!
==========================================
Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus
No match!
==========================================
Young Sheldon S01e14 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS
Matched: Eng Ac3 - Sub Ita
==========================================
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e16 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S02e01-13 [XviD - Eng Mp3 - Sub Ita] DLRip by Pir8 [CURA] Fede e Religioni FULL
Matched: Eng Mp3 - Sub Ita
==========================================
Absentia S01e01-10 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] By Morpheus The.Breadwinner.2017.ENG.Sub.ITA.HDRip.XviD-[WEB]
Matched: ENG.Sub.ITA
==========================================
Upvotes: 1