asymmetric
asymmetric

Reputation: 3890

Matching optional groups with lookahead in JavaScript regex

I'm trying to solve a string matching problem with regexes. I need to match URLs of this form:

http://soundcloud.com/okapi23/dont-turn-your-back/

And I need to "reject" URL of this form:

http://soundcloud.com/okapi23/sets/happily-reversed/

The trailing '/' is obviously optional.

So basically:

What I came up so far is http(s)?://(www\.)?soundcloud\.com/.+/(?!sets)\b(/.+)?, which fails.

Any suggestions? Are there any libraries that would simplify the task (for example, making trailing slashes optional)?

Upvotes: 2

Views: 921

Answers (3)

ridgerunner
ridgerunner

Reputation: 34385

Assuming that the OP wants to test to see if a given string contains a URL which meets the following requirements:

  • URL scheme must be either http: or https:.
  • URL authority must be either //soundcloud.com or //www.soundcloud.com.
  • URL path must exist and must contain 2 or 3 path segments.
  • The second path segment must not be: "sets".
  • Each path segment must consist of one or more "words" consisting of only alphanumeric characters ([A-Za-z0-9]) and multiple words are separated by exactly one dash or underscore.
  • The URL must have no query or fragment component.
  • The URL path may end with an optional "/".
  • The URL should match case insensitively.

Here is a tested JavaScript function (with a fully commented regex) which does the trick:

function isValidCustomUrl(text) {
    /* Here is the regex commented in free-spacing mode:
    # Match specific URL having non-"sets" 2nd path segment.
    ^                          # Anchor to start of string.
    https?:                    # URL Scheme (http or https).
    //                         # Begin URL Authority.
    (?:www\.)?                 # Optional www subdomain.
    soundcloud\.com            # URL DNS domain.
    /                          # 1st path segment (can be: "sets").
    [A-Za-z0-9]+               # 1st word-portion (required).
    (?:                        # Zero or more extra word portions.
      [-_]                     # only if separated by one - or _.
      [A-Za-z0-9]+             # Additional word-portion.
    )*                         # Zero or more extra word portions.
    (?!/sets(?:/|$))           # Assert 2nd segment not "sets".
    (?:                        # 2nd and 3rd path segments.
      /                        # Additional path segment.
      [A-Za-z0-9]+             # 1st word-portion.
      (?:                      # Zero or more extra word portions.
        [-_]                   # only if separated by one - or _.
        [A-Za-z0-9]+           # Additional word-portion.
      )*                       # Zero or more extra word portions.
    ){1,2}                     # 2nd path segment required, 3rd optional.
    /?                         # URL may end with optional /.
    $                          # Anchor to end of string.
    */
    // Same regex in javascript syntax:
    var re = /^https?:\/\/(?:www\.)?soundcloud\.com\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*(?!\/sets(?:\/|$))(?:\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*){1,2}\/?$/i;
    if (re.test(text)) return true;
    return false;
}

Upvotes: 5

Ωmega
Ωmega

Reputation: 43673

I suggest you to go with regex pattern

^https?:\/\/soundcloud\.com(?!\/[^\/]+\/sets(?:\/|$))(?:\/[^\/]+){2,3}\/?$

Upvotes: 1

Mark Byers
Mark Byers

Reputation: 838106

Instead of . use [a-zA-Z][\w-]* which means "match a letter followed by any number of letters, numbers, underscores or hyphens".

^https?://(www\.)?soundcloud\.com/[a-zA-Z][\w-]*/(?!sets(/|$))[a-zA-Z][\w-]*(/[a-zA-Z][\w-]*)?/?$

To get the optional trailing slash, use /?$.

In a Javascript regular expression literal all the forward slashes must be escaped.

Upvotes: 4

Related Questions