d-_-b
d-_-b

Reputation: 23171

Javascript match for string that doesn't contain a character

I'm trying to use Javascript match a url pattern containing one directory, with an optional trailing slash.

For example

This should match:

text and http://twitter.com/path and more text

This should not match:

text and http://twitter.com/path/other/directories and more text

Even though the shorter string exists in the longer one, I don't want the longer one to return anything.

Is this possible?


Here's what I've tried so far:

The approach was to match the url, and then either use a negative character class, or a negative lookback

I've tried the following:

/(https?:)?(\/\/)?(www\.)?twitter\.com\/[a-z0-9_+-]+\/?(?![a-z0-9_+-])/ig

This was meant to look for a twitter URL, with a \w+ path, with optional trailing slash, not followed by any other \w+.

While this doesn't include the second directory in its match, I wanted it to not match the string at all.

/(https?:)?(\/\/)?(www\.)?twitter\.com\/\w+[^\/\w]*/ig

This was meant to find the URL, but exclude slashes and \w following. Similar to the previous try, it still matches the long links.

I've tried variations like this, but can't get it to work:

var regex1 = /(https?:)?(\/\/)?(www\.)?twitter\.com\/\w+(?!\/\w+)/ig;

var regex2 = /(https?:)?(\/\/)?(www\.)?twitter\.com\/\w+[^\/\w]*/ig;

var shouldMatch = 'text https://twitter.com/page text';
var shouldNotMatch = 'text https://twitter.com/page/status/123 text';

console.log('regex1 should match', shouldMatch.match(regex1));

console.log('regex1 should return []', shouldNotMatch.match(regex1));

console.log('regex2 should match', shouldMatch.match(regex2));

console.log('regex2 should return []', shouldNotMatch.match(regex2));

Upvotes: 0

Views: 75

Answers (2)

Nick
Nick

Reputation: 147156

You could use a negative lookahead at the end of the valid part, asserting that the page name is not followed by either a / and a word character, or another word character. The addition of the other word character alternation to the negative lookahead prevents the regex otherwise matching at (for example) http://twitter.com/pag.

var regex1 = /(https?:\/\/)?(www\.)?twitter\.com\/\w+(?!\/\w|\w)/ig;


var shouldMatch = 'https://twitter.com/page is a valid url';
var shouldNotMatch = 'https://twitter.com/page/status/123 is not valid';

console.log('regex1 should match', shouldMatch.match(regex1));

console.log('regex1 should return []', shouldNotMatch.match(regex1));

Upvotes: 1

CAustin
CAustin

Reputation: 4614

I would use positive lookahead to assert that there's either a whitespace character or the end of the string following the last group of \w+.

(?:https?:\/\/)?(?:www\.)?twitter\.com\/\w+(?=\s|$)

Demo: https://regex101.com/r/EMHxq9/3

I also used non-capturing groups in place of capturing groups because it doesn't look like you're back referencing anything, and combined the optional "https:" and "//" parts because it would be weird to have one and not the other.

Upvotes: 1

Related Questions