vsong
vsong

Reputation: 23

Unsure how to capture slash symbol "/" in my regex expression

I currently have a regular expression /((http:\/\/.+(.net\/|.com\/))|^\/)(.+)$/gm that extracts the relative URL of an absolute or relative path (I know that the path will either be a .com or .net domain, or it could just be the relative path altogether).

It works fine, except that I don't know how to get the slash into the last capturing group. Some examples:

http://google.com/abcd/efg (captures "abcd/efg", but I want "/abcd/efg")
http://google.com/abcd (captures "abcd", but I want "/abcd")
http://google.com/ (Fail)
http://google.com (Fail)
/abcd (captures "abcd", but I want "/abcd")
/ (Fail)

It feels like I am missing something obvious, any help would be appreciated.

Upvotes: 2

Views: 120

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Without the pattern reordering and grouping construct boundary change you cannot achieve that.

In the ((http:\/\/.+(.net\/|.com\/))|^\/) first capturing group, the / slash should be moved to the second group (.+).

I suggest using

/(http:\/\/.+(\.net|\.com)|^)(\/.+)$/gm

See the regex demo

Details:

  • (http:\/\/.+(\.net|\.com)|^) - Group 1:
    • http:\/\/.+(\.net|\.com) - http://, any 1+ chars other than linebreak chars, .net or .com captured into Group 2 (if this group is redundant, replace (\.net|\.com) with \.(?:net|com))
    • | - or
    • ^ - start of string
  • (\/.+) - Group 3 (or 2): a / slash and any 1+ chars other than line break chars.

Upvotes: 1

xenteros
xenteros

Reputation: 15842

What about this:

(?<!(http:\/\/))\/[^\/]*
each group is text between / incl and next / excl.

e.g.

for http://google.com/abc/def/ghi there will be four groups captured:

  1. /google.com
  2. /abc
  3. /def
  4. /ghi

Just concatinate all except the first one and you'll receive what's desired.

Upvotes: 1

Related Questions