Reputation: 3852
The lines to match against are
part1a_part1b__part1c_part1d_part3.extension
part1a_part1b__part1c_part1d__part3.extension
part1a_part1b__part1c_part1d_part2short_part3.extension
part1a_part1b__part1c_part1d_part2short__part3.extension
part1a_part1b__part1c_part1d_part2_part3.extension
part1a_part1b__part1c_part1d_part2__part3.extension
part1a_part1b__part1c_part1d_part2full_part3.extension
part1a_part1b__part1c_part1d_part2full__part3.extension
part1a_part1b__part1c_part1d_part2short-part3.extension
part1a_part1b__part1c_part1d_part2-part3.extension
part1a_part1b__part1c_part1d_part2full-part3.extension
part1a_part1b__part1c_part1d_part4.extension
part1a_part1b__part1c_part1d__part4.extension
The desired match should give exactly part1a_part1b__part1c_part1d
for all the above lines except the last two lines. That is to say, the "stem" has an arbitrary number of part1
, an optional part2 (in limited forms)
, and must ends with part3.extension
.
Right now, I only got as far as
(?P<stem>[[:alnum:]_-]+)(?=(|part2short|part2|part2full))[_-]+part3\.extension
,by which the matched "stem" values for the lines above are
part1a_part1b__part1c_part1d
part1a_part1b__part1c_part1d_
part1a_part1b__part1c_part1d_part2short
part1a_part1b__part1c_part1d_part2short_
part1a_part1b__part1c_part1d_part2
part1a_part1b__part1c_part1d_part2_
part1a_part1b__part1c_part1d_part2full
part1a_part1b__part1c_part1d_part2full_
part1a_part1b__part1c_part1d_part2short
part1a_part1b__part1c_part1d_part2
part1a_part1b__part1c_part1d_part2full
Could you help to comment how to match exactly part1a_part1b__part1c_part1d
from all the above lines except the last two lines, if it is possible ?
Upvotes: 0
Views: 881
Reputation: 784938
You may use this regex using a non-greedy match, a lookahead with an optional match:
(?m)^(?P<stem>[[:alnum:]_-]+?)(?=(?:[_-]+part2(?:short|full)?)?[_-]+part3\.extension$)
(?=(?:[_-]+part2(?:short|full)?)?[_-]+part3\.extension$)
is a positive lookahead that asserts line ends with [-_]part3.extension
with optional [-_]part2...
string before.
Upvotes: 1
Reputation: 163207
You could match the first 4 parts with the text and the underscores and use a positive lookahead that asserts that the string ends with part3.extension:
^(?P<stem>[^_]+_[^_]+__[^_]+_[^_]+)(?=.*part3\.extension$)
That would match:
^ # Begin of the string (?P<stem> # Named captured group stem [^_]+_ # Match not _ one or more times, then _ [^_]+__ # Match not _ one or more times, then __ [^_]+_ # Match not _ one or more times, then _ [^_]+ # # Match not _ one or more times ) # Close named capturing group (?= # A positive lookahead that asserts what follows .*part3\.extension$ # Match part3.extension at the end of the string ) # Close lookahead
Upvotes: 1