Reputation: 51
Have used an online regex learning site (regexr) and created something that works but with my very limited experience with regex creation, I could do with some help/advice.
In IIS10 logs, there is a list for time, date... but I am only interested in the cs(User-Agent) field.
My Regex:
(scan\-\d+)(?:\w)+\.shadowserver\.org
which matches these:
scan-02.shadowserver.org
scan-15n.shadowserver.org
scan-42o.shadowserver.org
scan-42j.shadowserver.org
scan-42b.shadowserver.org
scan-47m.shadowserver.org
scan-47a.shadowserver.org
scan-47c.shadowserver.org
scan-42a.shadowserver.org
scan-42n.shadowserver.org
scan-42o.shadowserver.org
but what I would like it to do is:
I will then add it to an existing URL Rewrite rule (as a condition) to abort the request.
Any advice/help would be very much appreciated
Tried:
To write a regex for IIS10 to block requests from a certain user-agent
Expected:
It to work on single numbers as well as double/triple numbers with or without a letter.
(scan\-\d+)(?:\w)+\.shadowserver\.org
Input Text:
scan-2.shadowserver.org
scan-02.shadowserver.org
scan-2j.shadowserver.org
scan-02j.shadowserver.org
scan-17w.shadowserver.org
scan-101p.shadowserver.org
UPDATE:
I eventually came up with this:
scan\-[0-9]+[a-z]{0,1}\.shadowserver\.org
Upvotes: 0
Views: 97
Reputation: 1086
This is explanation of your regex pattern if you only want the solution, then go directly to the end.
(scan\-\d+)(?:\w)+
(scan\-\d+)
Group1: match the word scan
followed by a literal -
, you escaped the hyphen with a \
, but if you keep it without escaping it also means a literal -
in this case, so you don't have to escape it here, the -
followed by \d+
which means one more digit from 0-9
there must be at least one digit, then the value inside the group will be saved inside the first capturing group.
(?:\w)+
non-capturing group, \w
one character which is equal to [A-Za-z0-9_]
, but the the plus +
sign after the non-capturing group (?:\w)+
, means match the whole group one or more times, the group contains only \w
which means it will match one or more word character, note the non-capturing group here is redundant and we can use \w+
directly in this case.
Taking two examples:
The first example: scan-02.shadowserver.org
(scan\-\d+)(?:\w)+
scan
will match the word scan
in scan-02
and the \-
will match the hyphen after scan scan-
, the \d+
which means match one or more digit at first it will match the 02
after scan-
and the value would be scan-02
, then the (?:\w)+
part, the plus +
means match one or more word character, at least match one, it will try to match the period .
but it will fail, because the period .
is not a word character, at this point, do you think it is over ? No , the regex engine will return back to the previous \d+
, and this time it will only match the 0
in scan-02
, and the value scan-0
will be saved inside the first capturing group, then the (?:\w)+
part will match the 2
in scan-02
, but why the engine returns back to \d+
? this is because you used the +
sign after \d+
, (?:\w)+
which means match at least one digit, and one word character respectively, so it will try to do what it is asked to do literally.The second example: scan-2.shadowserver.org
(scan\-\d+)(?:\w)+
(scan\-\d+)
will match scan-2
, (?:\w)+
will try to match the period after scan-2
but it fails and this is the important point here, then it will go back to the beginning of the string scan-2.shadowserver.org
and try to match (scan\-\d+)
again but starting from the character c
in the string , so s
in (scan\-\d+)
faild to match c
, and it will continue trying, at the end it will fail.Simple solution:
(scan-\d+[a-z]?)\.shadowserver\.org
Explanation
(scan-\d+[a-z]?)
, Group1: will capture the word scan
, followed by a literal -
, followed by \d+
one or more digits, followed by an optional small letter [a-z]?
the ?
make the [a-z]
part optional, if not used, then the [a-z]
means that there must be only one small letter.
See regex demo
Upvotes: 1