Reputation: 23
I currently have a regular expression /((http:\/\/.+(.net\/|.com\/))|^\/)(.+)$/gm
that extracts the relative URL of an absolute or relative path (I know that the path will either be a .com or .net domain, or it could just be the relative path altogether).
It works fine, except that I don't know how to get the slash into the last capturing group. Some examples:
http://google.com/abcd/efg (captures "abcd/efg", but I want "/abcd/efg")
http://google.com/abcd (captures "abcd", but I want "/abcd")
http://google.com/ (Fail)
http://google.com (Fail)
/abcd (captures "abcd", but I want "/abcd")
/ (Fail)
It feels like I am missing something obvious, any help would be appreciated.
Upvotes: 2
Views: 120
Reputation: 626845
Without the pattern reordering and grouping construct boundary change you cannot achieve that.
In the ((http:\/\/.+(.net\/|.com\/))|^\/)
first capturing group, the /
slash should be moved to the second group (.+)
.
I suggest using
/(http:\/\/.+(\.net|\.com)|^)(\/.+)$/gm
See the regex demo
Details:
(http:\/\/.+(\.net|\.com)|^)
- Group 1:
http:\/\/.+(\.net|\.com)
- http://
, any 1+ chars other than linebreak chars, .net
or .com
captured into Group 2 (if this group is redundant, replace (\.net|\.com)
with \.(?:net|com)
)|
- or^
- start of string(\/.+)
- Group 3 (or 2): a /
slash and any 1+ chars other than line break chars.Upvotes: 1
Reputation: 15842
What about this:
(?<!(http:\/\/))\/[^\/]*
each group is text between /
incl and next /
excl.
e.g.
for http://google.com/abc/def/ghi
there will be four groups captured:
Just concatinate all except the first one and you'll receive what's desired.
Upvotes: 1