Reputation: 259
I want a regular expression to grab urls that does not contain specific word in their domain name but no matter if there is that word in the query string or other subdirectories of the domain.Also it doesn't matter how the hrl starts for exmaple by http/fttp/https/without any of them. I found this expression ^((?!foo).)*$") I don't know how should I change it to fit into these conditions. These are the accepted url for the word "foo":
whatever.whatever.whatever/foo/pic
whatever.whatever.whatever?sdfd="foo"
and these are not accepted:
whatever.whateverfoo.whatever
whatever.foowhatever.whatever
whatever.foo.whatever.whatever
whatever.whatever.foo.whatever
Upvotes: 0
Views: 1334
Reputation: 4886
Here's a regex that will match the cases that you want to reject
(?:.+://){0,1}(?<subdomain>[^.]+\.){0,1}(?<domain>[^.]*whatever[^.]*\.)(?<top>[^.]+).*
(?: ) is a non-capturing group
(?<groupName> )
is a named group (useful for testing, in regexhero you can see what is being captured by the group)
{0,1} means 0 or 1
. means any character except new line
[^.] means any character except "."
means 0 or more
means 1 or more, for example, .+ means 1 or many "any characters"
. escapes the special character .
See http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
you can try it here: http://regexhero.net/tester/
Upvotes: 0
Reputation: 17494
Try this (explanation):
^(?:(?!foo).)*?[\/\?]
What this means is basically:
foo
The precise syntax may vary depending on your programming language/editor. The explanation link shows the PHP example. The regex elements I've used are pretty common, so it should work for you. If not, let me know.
This regex can only be matched against a single URL at a time. So if you are trying this in regex101, don't enter all URLs at once.
Update: Example in Java (now using turner
instead of foo
):
Pattern p = Pattern.compile("^(?:(?!turner).)*?[\\/\\?].*");
System.out.println(p.matcher(
"i.cdn.turner.com/cnn/.e/img/3.0/1px.gif").matches());
System.out.println(p.matcher(
"www.facebook.com/plugins/like.php?href=http%3A%2F%2F"
+ "www.facebook.com%2Fturnerkjljl").matches());
Output:
false
true
Upvotes: 1
Reputation: 351
Here is your regex in java
"^[^/?]+(?<!foo)"
Explanation - From beginning search for characters which does not matches with / or ?. The moment it finds any one of the above two characters then the pattern search backward for negative match for foo. If foo is found then it returns false else true. This is in java. Also the regex will vary from language to language.
in grep cmd (unix or shell script) you have to take negation of the following regex match
"^[^/?]+foo"
Upvotes: 0