f10bit
f10bit

Reputation: 19770

Awk parsing of unique IP addresses from maillog

Yesterday I asked a question here about a oneliner and mjschultz gave me an answer that I instantly fell in love with :) Awk just destroyed the task at hand, parsing a large logfile (500+ MB) in a matter of seconds. Now I'm trying to port my other oneliners to awk.

This is the one in question:

grep "pop3\[" maillog | grep "User logged in" |  
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u

I need the list of all unique IP addresses using pop3 to connect to the mail server.

This is an example log entry:

Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext  
User logged in

So I find all the lines containing "pop3" and I parse them for the "User logged in" part. Next i use egrep and a regex to match IP addresses and I use sort to filter out the duplicate addresses.

This is what I have so far for my awk version:

awk '/pop3\[.*.User logged in/ {ip[$7]=0} END {for (address in ip)  
{ print address} }' maillog

This works perfectly but as always not all log entries are identical, for example sometimes the IP gets moved to the 8th field like here:

Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]  
username plaintext User logged in

What would be the best way to catch those entries with awk as well?

As always thanks for all the great responses in advance, you've taught me so much already :)

Upvotes: 2

Views: 5837

Answers (3)

f10bit
f10bit

Reputation: 19770

After seeing and trying these approaches I got a new idea.

belisarius's code does what I asked for but since it has to do all the regex matching it's not the fastest one and speed is what I'm after.

So I came up with this, as you can see the "problematic" log lines have an extra field, making them all 13 fields long instead of the normal 12, so I just delete the extra field, this gives me the correct list of IP addresses, next i use awk again to delete all duplicate entries:

awk '/pop3\[.*.User logged in/ {{if (NF == 13) $7="";gsub(FS "+",FS)};print $7}'
/var/log/maillog | awk '!($0 in a){a[$0];print}'

Ideone link if you want to see the code in action

Upvotes: 0

Dr. belisarius
Dr. belisarius

Reputation: 61016

AWK code

just match your ip format ... be careful that there are no other formats ...

/pop3\[.*.User logged in/    {
         where = match($0,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
         if (where)
           ip[substr($0,RSTART+1,RLENGTH-1)]=0
} 

END {for (address in ip)  
{ print address} }  

running at ideone

Upvotes: 3

Jonathan Leffler
Jonathan Leffler

Reputation: 753665

That looks more like Perl territory than Awk to me:

my %ip_addresses = ();
while (<>)
{
    next unless m/pop3\[/;
    next unless m/User logged in/;
    if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
    {
         $ip_addresses{$ip} = 1;
    }
}
foreach my $ip (sort keys %ip_addresses)
{
    print "$ip\n";
}

The sort is less than perfect - being alphabetic rather than numeric (so 192.1.168.10 will appear before 9.25.13.26). That can be fixed, of course.

Upvotes: 0

Related Questions