Joe Smalley
Joe Smalley

Reputation: 313

regex string does not contain substring

I am trying to match a string which does not contain a substring

My string always starts "http://www.domain.com/"

The substring I want to exclude from matches is ".a/" which comes after the string (a folder name in the domain name)

There will be characters in the string after the substring I want to exclude

For example:

"http://www.domain.com/.a/test.jpg" should not be matched

But "http://www.domain.com/test.jpg" should be

Upvotes: 17

Views: 39236

Answers (4)

codaddict
codaddict

Reputation: 454960

Use a negative lookahead assertion as:

^http://www\.domain\.com/(?!\.a/).*$

Rubular Link

The part (?!\.a/) fails the match if the URL is immediately followed with a .a/ string.

Upvotes: 29

nonopolarity
nonopolarity

Reputation: 150976

If you don't use look ahead, but just simple regex, you can just say, if it matches your domain but doesn't match with a .a/

<?php

function foo($s) {

    $regexDomain = '{^http://www.domain.com/}';
    $regexDomainBadPath = '{^http://www.domain.com/\.a/}';

    return preg_match($regexDomain, $s) && !preg_match($regexDomainBadPath, $s);
}

var_dump(foo('http://www.domain.com/'));
var_dump(foo('http://www.otherdomain.com/'));

var_dump(foo('http://www.domain.com/hello'));
var_dump(foo('http://www.domain.com/hello.html'));
var_dump(foo('http://www.domain.com/.a'));
var_dump(foo('http://www.domain.com/.a/hello'));
var_dump(foo('http://www.domain.com/.b/hello'));
var_dump(foo('http://www.domain.com/da/hello'));

?>

note that http://www.domain.com/.a will pass the test, because it doesn't end with /.

Upvotes: 0

M&#39;vy
M&#39;vy

Reputation: 5774

I would try with

^http:\/\/www\.domain\.com\/([^.]|\.[^a]).*$

You want to match your domain, plus everything that do not continue with a . and everything that do continue with a . but not a a. (Eventually you can add you / if needed after)

Upvotes: 0

Ingo
Ingo

Reputation: 36329

My advise in such cases is not to construct overly complicated regexes whith negative lookahead assertions or such stuff.
Keep it simple and stupid!
Do 2 matches, one for the positives, and sort out later the negatives (or the other way around). Most of the time, the regexes become easier, if not trivial. And your program gets clearer.
For example, to extract all lines with foo, but not foobar, I use:

grep foo | grep -v foobar

Upvotes: 9

Related Questions