Reputation: 60027
I'm performing regex matching in .NET against strings that look like this:
1;#Lists/General Discussion/Waffles Win 2;#Lists/General Discussion/Waffles Win/2_.000 3;#Lists/General Discussion/Waffles Win/3_.000
I need to match the URL portion without the numbers at the end, so that I get this:
Lists/General Discussion/Waffles Win
This is the regex I'm trying:
(?:\d+;#)(?<url>.+)(?:/\d+_.\d+)*
The problem is that the last group is being included as part of the middle group's match. I've also tried without the * at the end but then only the first string above matches and not the rest.
I have the multi-line option enabled. Any ideas?
Upvotes: 3
Views: 1506
Reputation: 99355
You could try
^(\d+;#)([^/]+(/[^\d][^/]*)*)
and get the 2nd group. The first group matches the 1;#
; the second group is split into the first part or the URL (assumed to contain any character other than /), then match any number of groups of /, followed by a non-digit, followed by anything other than /.
Tested on this site, appears to do what you want. Give it a try with some more samples.
Upvotes: 0
Reputation: 89171
A few different alternatives:
@"^\d+;#([^/]+(?:/[^/]+)*?)(?:/\d+_\.\d+)?$"
This matches as few path segments as possible, followed by an optional last part, and the end of the line.
@"^\d+;#([^/]+(?:/(?!\d+_\.\d+$)[^/]+)*)"
This matches as many path segments as possible, as long as it is not the digit-part at the end of the line.
@"^\d+;#(.*?)(?:/\d+_\.\d+)?$"
This matches as few characters as possible, followed by an optional last part, and the end of the line.
Upvotes: 4