Reputation: 3890
I'm trying to solve a string matching problem with regexes. I need to match URLs of this form:
http://soundcloud.com/okapi23/dont-turn-your-back/
And I need to "reject" URL of this form:
http://soundcloud.com/okapi23/sets/happily-reversed/
The trailing '/' is obviously optional.
So basically:
What I came up so far is http(s)?://(www\.)?soundcloud\.com/.+/(?!sets)\b(/.+)?
, which fails.
Any suggestions? Are there any libraries that would simplify the task (for example, making trailing slashes optional)?
Upvotes: 2
Views: 921
Reputation: 34385
Assuming that the OP wants to test to see if a given string contains a URL which meets the following requirements:
http:
or https:
.//soundcloud.com
or //www.soundcloud.com
."sets"
.[A-Za-z0-9]
) and multiple words are separated by exactly one dash or underscore."/"
.Here is a tested JavaScript function (with a fully commented regex) which does the trick:
function isValidCustomUrl(text) {
/* Here is the regex commented in free-spacing mode:
# Match specific URL having non-"sets" 2nd path segment.
^ # Anchor to start of string.
https?: # URL Scheme (http or https).
// # Begin URL Authority.
(?:www\.)? # Optional www subdomain.
soundcloud\.com # URL DNS domain.
/ # 1st path segment (can be: "sets").
[A-Za-z0-9]+ # 1st word-portion (required).
(?: # Zero or more extra word portions.
[-_] # only if separated by one - or _.
[A-Za-z0-9]+ # Additional word-portion.
)* # Zero or more extra word portions.
(?!/sets(?:/|$)) # Assert 2nd segment not "sets".
(?: # 2nd and 3rd path segments.
/ # Additional path segment.
[A-Za-z0-9]+ # 1st word-portion.
(?: # Zero or more extra word portions.
[-_] # only if separated by one - or _.
[A-Za-z0-9]+ # Additional word-portion.
)* # Zero or more extra word portions.
){1,2} # 2nd path segment required, 3rd optional.
/? # URL may end with optional /.
$ # Anchor to end of string.
*/
// Same regex in javascript syntax:
var re = /^https?:\/\/(?:www\.)?soundcloud\.com\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*(?!\/sets(?:\/|$))(?:\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*){1,2}\/?$/i;
if (re.test(text)) return true;
return false;
}
Upvotes: 5
Reputation: 43673
I suggest you to go with regex pattern
^https?:\/\/soundcloud\.com(?!\/[^\/]+\/sets(?:\/|$))(?:\/[^\/]+){2,3}\/?$
Upvotes: 1
Reputation: 838106
Instead of .
use [a-zA-Z][\w-]*
which means "match a letter followed by any number of letters, numbers, underscores or hyphens".
^https?://(www\.)?soundcloud\.com/[a-zA-Z][\w-]*/(?!sets(/|$))[a-zA-Z][\w-]*(/[a-zA-Z][\w-]*)?/?$
To get the optional trailing slash, use /?$
.
In a Javascript regular expression literal all the forward slashes must be escaped.
Upvotes: 4