AWK to pick FQDN hostname conditionally from the File

Question

Experts I came again after reading how to provide minimal reproducible example, I am placing the question again.

I want to filter the fully qualified hostname(eg: dtc4028.ptc.db01.delta.com) and count the repetition on an individual host.

Below is my raw data:

Feb 24 07:20:56 dbv0102 postfix/smtpd[29531]: NOQUEUE: reject: RCPT from dtc4023.ptc.db01.delta.com[172.10.10.161]: 554 5.7.1 : Sender address rejected: Access denied; from= to= proto=ESMTP helo=
Feb 24 07:21:20 dbv0102 postfix/smtpd[29528]: NOQUEUE: reject: RCPT from dtc4023.ptc.db01.delta.com[172.10.10.161]: 554 5.7.1 : Sender address rejected: Access denied; from= to= proto=ESMTP helo=
Feb 21 05:05:06 dbv0102 postfix/smtpd[32001]: disconnect from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 05:05:23 dbv0102 postfix/smtpd[32010]: connect from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 05:06:15 dbv0102 postfix/smtpd[31994]: connect from dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 05:06:15 dbv0102 postfix/smtpd[31994]: disconnect from dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 13:05:08 dbv0102 postfix/smtpd[29043]: lost connection after CONNECT from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 13:05:08 dbv0102 postfix/smtpd[29048]: lost connection after CONNECT from dtc4028.ptc.db01.delta.com[172.12.78.82]

What myself tried:

What I am doing here, Just taking desired column 1,2,4 and 8

$ awk '/from dtc/{print $1, $2, $4, $8}' maillog.log
Feb 24 dbv0102 RCPT
Feb 24 dbv0102 RCPT
Feb 21 dbv0102 dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 dbv0102 dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 dbv0102 dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 dbv0102 dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 dbv0102 after
Feb 21 dbv0102 after

Secondly, I am removing RCPT|after as these lines do not have hostnames and then also removing [] to just have hostname's and count their repition.

$ awk '/from dtc/{print $1, $2, $4, $8}' maillog.log| egrep -v "RCPT|after" | awk '{print $4}'| cut -d"[" -f1 | uniq -c
      2 dtc4028.ptc.db01.delta.com
      2 dtc3024.ptc.db01.delta.com

What I Wish:

I wish if this can be written more intelligently with the awk itself rather i'm doing it dirty way.

Note: Can we get only the FQDN hostnames like dtc4028.ptc.db01.delta.com after the 6th column.

RavinderSingh13 · Accepted Answer

Based on your shown samples, could you please try following. Written and tested in GNU awk.

awk '
match($0,/from .*com\[/){
  count[substr($0,RSTART+5,RLENGTH-6)]++
}
END{
  for(key in count){
    print count[key],key
  }
}
' Input_file

Explanation: Adding detailed explanation for above.

awk '                                      ##Starting awk program from here.
match($0,/from .*com\[/){                  ##Using match function to match regex from .*com\[
  count[substr($0,RSTART+5,RLENGTH-6)]++   ##Whenever match is having a regex matched so it sets RSTART and RLENGTH, RSTART tells us starting point of matched regex and RLENGTH is complete length.
}
END{                                       ##Starting END block of this program from here.
  for(key in count){                       ##Traversing through count array here.
    print count[key],key                   ##Printing its key and value here.
  }
}
' Input_file                               ##Mentioning Input_file name here.

AWK to pick FQDN hostname conditionally from the File

Answers (2)

Related Questions