Reputation: 169
Experts I came again after reading how to provide minimal reproducible example, I am placing the question again.
I want to filter the fully qualified hostname(eg: dtc4028.ptc.db01.delta.com
) and count the repetition on an individual host.
Below is my raw data:
Feb 24 07:20:56 dbv0102 postfix/smtpd[29531]: NOQUEUE: reject: RCPT from dtc4023.ptc.db01.delta.com[172.10.10.161]: 554 5.7.1 <[email protected]>: Sender address rejected: Access denied; from=<[email protected]> to=<[email protected]> proto=ESMTP helo=<dtc4023.ptc.db01.delta.com>
Feb 24 07:21:20 dbv0102 postfix/smtpd[29528]: NOQUEUE: reject: RCPT from dtc4023.ptc.db01.delta.com[172.10.10.161]: 554 5.7.1 <[email protected]>: Sender address rejected: Access denied; from=<[email protected]> to=<[email protected]> proto=ESMTP helo=<dtc4023.ptc.db01.delta.com>
Feb 21 05:05:06 dbv0102 postfix/smtpd[32001]: disconnect from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 05:05:23 dbv0102 postfix/smtpd[32010]: connect from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 05:06:15 dbv0102 postfix/smtpd[31994]: connect from dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 05:06:15 dbv0102 postfix/smtpd[31994]: disconnect from dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 13:05:08 dbv0102 postfix/smtpd[29043]: lost connection after CONNECT from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 13:05:08 dbv0102 postfix/smtpd[29048]: lost connection after CONNECT from dtc4028.ptc.db01.delta.com[172.12.78.82]
What myself tried:
What I am doing here, Just taking desired column 1,2,4 and 8
$ awk '/from dtc/{print $1, $2, $4, $8}' maillog.log
Feb 24 dbv0102 RCPT
Feb 24 dbv0102 RCPT
Feb 21 dbv0102 dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 dbv0102 dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 dbv0102 dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 dbv0102 dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 dbv0102 after
Feb 21 dbv0102 after
Secondly, I am removing RCPT|after
as these lines do not have hostnames and then also removing []
to just have hostname's and count their repition.
$ awk '/from dtc/{print $1, $2, $4, $8}' maillog.log| egrep -v "RCPT|after" | awk '{print $4}'| cut -d"[" -f1 | uniq -c
2 dtc4028.ptc.db01.delta.com
2 dtc3024.ptc.db01.delta.com
What I Wish:
I wish if this can be written more intelligently with the awk itself rather i'm doing it dirty way.
Note: Can we get only the FQDN hostnames like dtc4028.ptc.db01.delta.com
after the 6th column.
Upvotes: 1
Views: 260
Reputation: 203684
$ awk -F'[[ ]' '$8=="from"{ cnt[$9]++ } END{ for (host in cnt) print cnt[host], host }' file
2 dtc4028.ptc.db01.delta.com
2 dtc3024.ptc.db01.delta.com
Upvotes: 2
Reputation: 133538
Based on your shown samples, could you please try following. Written and tested in GNU awk
.
awk '
match($0,/from .*com\[/){
count[substr($0,RSTART+5,RLENGTH-6)]++
}
END{
for(key in count){
print count[key],key
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/from .*com\[/){ ##Using match function to match regex from .*com\[
count[substr($0,RSTART+5,RLENGTH-6)]++ ##Whenever match is having a regex matched so it sets RSTART and RLENGTH, RSTART tells us starting point of matched regex and RLENGTH is complete length.
}
END{ ##Starting END block of this program from here.
for(key in count){ ##Traversing through count array here.
print count[key],key ##Printing its key and value here.
}
}
' Input_file ##Mentioning Input_file name here.
Upvotes: 3