Adonist
Adonist

Reputation: 153

SED and GREP showing different results

I'm trying to get the amount of requests in a specific range of time from my Apache log. I though it was quite easy doing that with sed however when I tried doing the same with grep I realised that grep shows more results than sed.

Here's the grep command I used:

#grep '26/Apr/2017:08:0[0-2]:[0-2][0-4]' access.log 

10.51.32.104 - - [26/Apr/2017:08:00:21 +0100] "GET / HTTP/1.1" 301 762 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
10.51.32.104 - - [26/Apr/2017:08:00:22 +0100] "GET /index.php?action=Login&module=Users HTTP/1.1" 200 6591 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
172.30.180.113 - - [26/Apr/2017:08:02:04 +0100] "GET / HTTP/1.0" 301 1906 "-" "Mozilla/4.0 (compatible; ipMonitor 10.7)"
172.30.180.113 - - [26/Apr/2017:08:02:04 +0100] "GET /index.php?action=Login&module=Users HTTP/1.0" 200 21951 "-" "Mozilla/4.0 (compatible; ipMonitor 10.7)"

And here's the sed command:

#sed -n '/26\/Apr\/2017:08:00:21/ , /26\/Apr\/2017:08:02:04/p' access.log

10.51.32.104 - - [26/Apr/2017:08:00:21 +0100] "GET / HTTP/1.1" 301 762 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
10.51.32.104 - - [26/Apr/2017:08:00:22 +0100] "GET /index.php?action=Login&module=Users HTTP/1.1" 200 6591 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
172.30.180.113 - - [26/Apr/2017:08:02:04 +0100] "GET / HTTP/1.0" 301 1906 "-" "Mozilla/4.0 (compatible; ipMonitor 10.7)"

So, as you can see it's missing one access from 172.30.180.113 that matches the pattern.

What did I do wrong? Would have any other parameter in sed helped, or is there a better way to do this?

Upvotes: 4

Views: 204

Answers (3)

ghoti
ghoti

Reputation: 46826

Yes, there's a better way to do this (which I mention at the bottom). Since recommendations would be off-topic for StackOverflow, I'll just respond with an explanation as to what's going on within the code that you've provided.

Your grep command prints every line of input that matches the regular expression you've specified. While this works, it's sometimes difficult to specify ranges purely in regex. (How would you specify a range from Jan 10th to March 2nd?)

A sed command can be a tad more complex. Consider the following:

$ sed -n -e '/re/p'

This will print all lines that match the regular expression re. Basically the same as grep.

$ sed -n -e '/re1/,/re2/p'

This will print all lines starting with the first match of re1 and ending with the first match of re2. This is what the sed script in your question is doing. Note that this also has the potential to print lines that DO NOT match one of the regular expressions:

$ printf 'one\ntwo\nthree\nfour\n' | sed -ne '/one/,/three/p'
one
two
three

If you want to extract counts of lines in your logs using sed, I recommend an alternate approach. While sed is great for pattern matching, it doesn't provide tools that can interpret dates. Perl, or gawk, or even bash would provide more functionality, and be easier to understand/debug six months from now when you need to make changes to your code.

Upvotes: 1

alvits
alvits

Reputation: 6758

You are quite close to solving it using sed. That is a good start and I will encourage you in going that route.

Of course you could use regex but it has its limitation. Consider the range 08:00 to 09:59, the regex will be easy 0[89]:[0-5][09]. But if the range is 08:45 to 09:30, then regex will not be your friend. Hence, my encouragement to use the range as you tried.

The limitation you have seen with sed is that the end range is met and sed has stopped processing from there. But we know that there will be more lines that fall within the end range.

sed -n '/26\/Apr\/2017:08:00:21/,/26\/Apr\/2017:08:02:04/{p;b};/26\/Apr\/2017:08:02:04/p' access.log

Breaking down the sed commands:

/26\/Apr\/2017:08:00:21/,/26\/Apr\/2017:08:02:04/{p;b}

This will print the line if within range and then branch to the end of the sed commands.

/26\/Apr\/2017:08:02:04/p

This will only get executed if outside the range in the previous sed command. This will take care of the extra lines that fall within the range but is not considered within range by sed.

The same technic can be used with awk.

awk '/26\/Apr\/2017:08:00:21/,/26\/Apr\/2017:08:02:04/{a=NR;print};a!=NR && /26\/Apr\/2017:08:02:04/{print}' access.log

The first command:

/26\/Apr\/2017:08:00:21/,/26\/Apr\/2017:08:02:04/{a=NR;print}

Will print the lines within the range and set variable a to the value of NR (current record number).

The second command:

a!=NR && /26\/Apr\/2017:08:02:04/{print}

Will print the remaining lines that are within range but awk considered outside of range.

Upvotes: 3

miken32
miken32

Reputation: 42675

As mentioned in comments, you're searching for a range of expressions, and sed will match all lines from the first match of the start to the first match of the end. As a language of its own, awk provides more flexibility than sed:

start=26/Apr/2017:08:00:21
end=26/Apr/2017:08:02:04
awk -v "s=$start" -v "e=$end" '$0~s{m=1} $0~e{m=0; f=1; print} f&&$0!~e{exit} m' access.log

We've got 4 conditional blocks. First we check for a match on the start and set m. Then we check for a match on the end and unset m, set f, and continue printing. The next check is for f, as long as there's no match on the end. This indicates that we've finished all the matches for the end string and can quit. The final block checks if m is set, and prints if it is.

A more verbose version of the same program:

awk -v "start_date=$start" -v "end_date=$end" '
{
    if ($0 ~ start_date) {
        matching = 1;
    }
    else if ($0 ~ end_date) {
        matching = 0;
        finishing = 1;
        print $0;
    }
    else if (finishing) {
        exit;
    }
    if (matching) {
        print $0;
    }
}
' access.log

Thanks to @alvits for beating me over the head in the comments until I figured out a better solution!

Upvotes: 1

Related Questions