Reputation: 153
I'm trying to get the amount of requests in a specific range of time from my Apache log. I though it was quite easy doing that with sed
however when I tried doing the same with grep
I realised that grep
shows more results than sed
.
Here's the grep
command I used:
#grep '26/Apr/2017:08:0[0-2]:[0-2][0-4]' access.log
10.51.32.104 - - [26/Apr/2017:08:00:21 +0100] "GET / HTTP/1.1" 301 762 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
10.51.32.104 - - [26/Apr/2017:08:00:22 +0100] "GET /index.php?action=Login&module=Users HTTP/1.1" 200 6591 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
172.30.180.113 - - [26/Apr/2017:08:02:04 +0100] "GET / HTTP/1.0" 301 1906 "-" "Mozilla/4.0 (compatible; ipMonitor 10.7)"
172.30.180.113 - - [26/Apr/2017:08:02:04 +0100] "GET /index.php?action=Login&module=Users HTTP/1.0" 200 21951 "-" "Mozilla/4.0 (compatible; ipMonitor 10.7)"
And here's the sed
command:
#sed -n '/26\/Apr\/2017:08:00:21/ , /26\/Apr\/2017:08:02:04/p' access.log
10.51.32.104 - - [26/Apr/2017:08:00:21 +0100] "GET / HTTP/1.1" 301 762 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
10.51.32.104 - - [26/Apr/2017:08:00:22 +0100] "GET /index.php?action=Login&module=Users HTTP/1.1" 200 6591 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
172.30.180.113 - - [26/Apr/2017:08:02:04 +0100] "GET / HTTP/1.0" 301 1906 "-" "Mozilla/4.0 (compatible; ipMonitor 10.7)"
So, as you can see it's missing one access from 172.30.180.113 that matches the pattern.
What did I do wrong? Would have any other parameter in sed
helped, or is there a better way to do this?
Upvotes: 4
Views: 204
Reputation: 46826
Yes, there's a better way to do this (which I mention at the bottom). Since recommendations would be off-topic for StackOverflow, I'll just respond with an explanation as to what's going on within the code that you've provided.
Your grep
command prints every line of input that matches the regular expression you've specified. While this works, it's sometimes difficult to specify ranges purely in regex. (How would you specify a range from Jan 10th to March 2nd?)
A sed
command can be a tad more complex. Consider the following:
$ sed -n -e '/re/p'
This will print all lines that match the regular expression re
. Basically the same as grep
.
$ sed -n -e '/re1/,/re2/p'
This will print all lines starting with the first match of re1
and ending with the first match of re2
. This is what the sed script in your question is doing. Note that this also has the potential to print lines that DO NOT match one of the regular expressions:
$ printf 'one\ntwo\nthree\nfour\n' | sed -ne '/one/,/three/p'
one
two
three
If you want to extract counts of lines in your logs using sed, I recommend an alternate approach. While sed
is great for pattern matching, it doesn't provide tools that can interpret dates. Perl, or gawk, or even bash would provide more functionality, and be easier to understand/debug six months from now when you need to make changes to your code.
Upvotes: 1
Reputation: 6758
You are quite close to solving it using sed
. That is a good start and I will encourage you in going that route.
Of course you could use regex
but it has its limitation. Consider the range 08:00
to 09:59
, the regex will be easy 0[89]:[0-5][09]
. But if the range is 08:45
to 09:30
, then regex
will not be your friend. Hence, my encouragement to use the range as you tried.
The limitation you have seen with sed
is that the end range is met and sed
has stopped processing from there. But we know that there will be more lines that fall within the end range.
sed -n '/26\/Apr\/2017:08:00:21/,/26\/Apr\/2017:08:02:04/{p;b};/26\/Apr\/2017:08:02:04/p' access.log
Breaking down the sed commands:
/26\/Apr\/2017:08:00:21/,/26\/Apr\/2017:08:02:04/{p;b}
This will p
rint the line if within range and then b
ranch to the end of the sed
commands.
/26\/Apr\/2017:08:02:04/p
This will only get executed if outside the range in the previous sed
command. This will take care of the extra lines that fall within the range but is not considered within range by sed
.
The same technic can be used with awk
.
awk '/26\/Apr\/2017:08:00:21/,/26\/Apr\/2017:08:02:04/{a=NR;print};a!=NR && /26\/Apr\/2017:08:02:04/{print}' access.log
The first command:
/26\/Apr\/2017:08:00:21/,/26\/Apr\/2017:08:02:04/{a=NR;print}
Will print the lines within the range and set variable a
to the value of NR
(current record number).
The second command:
a!=NR && /26\/Apr\/2017:08:02:04/{print}
Will print the remaining lines that are within range but awk
considered outside of range.
Upvotes: 3
Reputation: 42675
As mentioned in comments, you're searching for a range of expressions, and sed
will match all lines from the first match of the start to the first match of the end. As a language of its own, awk
provides more flexibility than sed
:
start=26/Apr/2017:08:00:21
end=26/Apr/2017:08:02:04
awk -v "s=$start" -v "e=$end" '$0~s{m=1} $0~e{m=0; f=1; print} f&&$0!~e{exit} m' access.log
We've got 4 conditional blocks. First we check for a match on the start and set m
. Then we check for a match on the end and unset m
, set f
, and continue printing. The next check is for f
, as long as there's no match on the end. This indicates that we've finished all the matches for the end string and can quit. The final block checks if m
is set, and prints if it is.
A more verbose version of the same program:
awk -v "start_date=$start" -v "end_date=$end" '
{
if ($0 ~ start_date) {
matching = 1;
}
else if ($0 ~ end_date) {
matching = 0;
finishing = 1;
print $0;
}
else if (finishing) {
exit;
}
if (matching) {
print $0;
}
}
' access.log
Thanks to @alvits for beating me over the head in the comments until I figured out a better solution!
Upvotes: 1