Reputation: 37
I'am searching for a single regex expression to match the first digit not in any kind of brackets within a string starting from the right side. Is this possible?
Sample Text:
[X-Y] Prelude of 2013 - 06 - From the darkness [FLAC 1080p][E0ECC01D].mkv
c:\Files\Prelude 2013[X-Y] Prelude of 2013 - 12 - From the darkness [FLAC 1080p][E0ECC01D].mkv
c:\Programm Files\Yamato 2199[M-L]Space Battleship Yamato 2199 - 09 - Mechanischer Gefangener [FLAC 1080p BD][19066E4A].mkv
Expected Results for each line respectively
06
12
09
Upvotes: 0
Views: 237
Reputation: 89565
You can use this kind of patterns:
ruby (work with php too):
(?>(?<s>\[(?>[^\]\[]++|\g<s>)*+\])|(?<p>\((?>[^()]++|\g<p>)*+\))|(?<c>\{(?>[^{}]++|\g<c>)*+\})|[^\d\[\](){}]++|(?<n>\d++))++
php:
~(?>(\[(?>[^][]++|(?1))*+])|(\((?>[^)(]++|(?2))*+\))|(\{(?>[^}{]++|(?3))*+\})|[^][)(}{\d]++|(?<n>\d++))++~
.net:
(?>(\[(?>[^\]\[]+|(\k<1>))*\])|(\((?>[^)(]+|(\k<2>))*\))|(\{(?>[^}{]+|(\k<3>))*\})|[^\]\[)(}{\d]+|(?<n>\d+))+
These patterns can deal with nested brackets and broken structures. Example with php:
<?php
$subjects = array(
"[X-Y] Prelude of 2013 - 06 - From the darkness [FLAC 1080p][E0ECC01D].mkv",
"c:\Programm Files\Yamato 2199[M-L]Space Battleship Yamato 2199 - 09 - Mechanischer Gefangener [FLAC 1080p BD][19066E4A].mkv",
"c:\Programm Files\Yam{ato 2195[M-L]Space} Bat{tlesh}ip Yamato (2[19)(9] - (09 10)) - Mechanischer Gefangener [FLAC 1080p BD][19066E4A][.mkv",
"name 34 [more(]stuff).avi",
"name 34 [[more]stuff].mkv");
$pattern = '~(?>(\[(?>[^][]++|(?1))*+])|(\((?>[^)(]++|(?2))*+\))|(\{(?>[^}{]++|(?3))*+\})|[^][)(}{\d]++|(?<n>\d++))++~';
?><pre><?php
foreach ($subjects as $subject) {
preg_match($pattern, $subject, $match);
echo (isset($match['n'])) ? $match['n'] : 'no match';
echo '<br/>';
}
explanations:
All quantifiers are possessive and all groups are atomic except the capturing groups for better performances.
The idea is to repeat as possible a pattern (in the first atomic group) containing a capture group for digits. On each occurence the old captured result is overwritten by the new until the pattern fail. Thus you obtain the last number.
Inside the repeated group you can find an alternation between the different possibilities:
The first three are the same for the different sort of braces, ie : []
, ()
, {}
and deals with nested structures:
(\[(?>[^][]++|(?1))*+])
(\((?>[^)(]++|(?2))*+\))
(\{(?>[^}{]++|(?3))*+\})
Detail for square bracket:
( #begin capturing group 1
\[ # opening square bracket
(?> # begin atomic group
[^][]++ # all characters that are not square brackets one or more times
| # OR
(?1) # repeat the capturing group 1
)*+ # repeat the atomic group zero or more times
] # closing square bracket
) #end capturing group 1
The two last alternations:
-useful to joint the other alternations:
[^][)(}{\d]++ # all characters that are not braces or digits one or more times
-The digits: (?<n>\d++)
in the named capture group n
Upvotes: 2
Reputation: 15010
Providing all bracketed text is single nested and the open & close brackets match, then you could simply take your input string and remove all the bracketed text first, then apply a simple regex to parse the last digit value.
To remove the bracketed text you could use: \[[^\]]*?\]|\([^)]*?\)|\{[^}]*?\}|\<[^>]*?\>
To parse the last digits in the remaining string: .*\D(\d+)
. This looks for the last set of digits which is proceeded by a non digit character. If the match is successful, then group 1 will have all the digits from the match.
You didn't list a language, so I'm simply using powershell here to demonistrate the logic, and how these would work together.
$string = 'c:\Programm Files\Yamato 2198[M-L]Space Battleship Yamato 2199 - 09 - Mechanischer Gefangener [FLAC 1080p BD][19066E4A].mkv'
write-host "Input String: '$string'"
$string = $String -replace '\[[^\]]*?\]|\([^)]*?\)|\{[^}]*?\}|\<[^>]*?\>', ""
write-host "No Brackets: '$string'"
if ($string -match '.*\D(\d+)') {
Write-Host "found the following matches"
$Matches
} else {
Write-Host "no matches found"
} # end if
Yields
Input String: 'c:\Programm Files\Yamato 2198[M-L]Space Battleship Yamato 2199 - 09 - Mechanischer Gefangener [FLAC 1080p BD][19066E4A].mkv'
No Brackets: 'c:\Programm Files\Yamato 2198Space Battleship Yamato 2199 - 09 - Mechanischer Gefangener .mkv'
found the following matches
Name Value
---- -----
1 09
0 c:\Programm Files\Yamato 2198Space Battleship Yamato 2199 - 09
Upvotes: 0