roryf
roryf

Reputation: 30160

Regex to match www.example.com only if http:// not present

I have the following regex that isn't working. I want to match the string 'www.example.com' but not the string 'http://www.example.com' (or 'anythingwww.example.com' for that matter):

/\bwww\.\w.\w/ig

This is used in JavaScript like this:

text = text.replace(/\bwww\.\w.\w/ig, 'http://$&');

I know the second part of the regex doesn't work correctly either, but it is the http:// part that is confusing me. It will currently match 'http://www.example.com' resulting in output of 'http://htpp://www.example.com'.

Upvotes: 3

Views: 2039

Answers (5)

Ben Blank
Ben Blank

Reputation: 56572

Perhaps try something like this?

text = text.replace(/(^|\s)(www(?:\.\w+){2,})/ig, "$1http://$2");

This will match the URLs such as:

  • "www.example.com" -> "http://www.example.com"
  • "Visit www.example.com" -> "Visit http://www.example.com"
  • "Visit www.example.co.uk" -> "Visit http://www.example.co.uk"

But not:

  • "http://www.example.com"
  • "ftp.example.com"
  • "www.com"

Upvotes: 3

Matthew Flaschen
Matthew Flaschen

Reputation: 284796

Does this do what you want? The anchor ensures the text starts with www. But obviously this will fail with other subdomains.

text = text.replace(/^www\.\w+\.\w+$/ig, "http://$&");

EDIT: Fixed thanks to Chris Lutz's comment. I did test earlier, but a strange combo of bugs (missing anchor, unescaped dot, etc.) made it seemingly work. I should reiterate that this is fragile anyway.

Upvotes: 3

JP Alioto
JP Alioto

Reputation: 45117

You can use a negative lookbehind assertion. Something like ...

(?<!http\:\/\/)(?:www.example.com)

Upvotes: 0

molf
molf

Reputation: 74945

Are you searching for the occurrence of www.example.com in a larger string? Maybe you can be more specific about what you want to match exactly, but something like this may work for you:

text = text.replace(/(\s)(www\.\w+\.\w+)/ig, "$1http://$2");

The problem with \b (which matches word boundaries) is that it also matches between http:// and www, because / is not a word character.

Upvotes: 4

lothar
lothar

Reputation: 20209

You can use the ^ indicator (anchor) to require the text to match to start with www:

echo -e "http://www.example.com\nanythingwww.example.com\nwww.example.com" | grep "^www.example.com"
www.example.com

Upvotes: 0

Related Questions