user53029
user53029

Reputation: 695

AWK - remove all but last occurance of log file line based on time

I have a file that collects option 82 data from our DHCP server. The files contains lines that are similar in all aspects except timestamps, and the server that they came from. I need to remove all "related" lines except for the last occurrance of the similar line based on time.

My raw file looks like this:

 Aug  1 16:23:05 serverA dhcpd: Service A OPTION-82 | IP =192.168.1.100 | MAC=70:73:cb:b3:3c:58 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 27 16:37:46 serverA dhcpd: Service A OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 27 16:37:46 serverB dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

After text processing I need to achieve this:

  Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
  Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
  Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

Here are a few things I have tried so far, but these seem to remove all instances of certain lines, and the finished file is missing data that we need.

 /bin/awk '!_[$9]++' rawfile
 /bin/awk 'NR == FNR {if (z[$9]) y[z[$9]]; z[$9] = FNR; next} !(FNR in y)' rawfile rawfile
 tac rawfile | awk '!seen[$9]++' | tac > finished_file

I'm in no way an expert on awk. I've found and tried these by just googling, so any help I could get would be greatly appreciated. And, I'm open to other text processing tools, not just awk.

Upvotes: 2

Views: 81

Answers (1)

haukex
haukex

Reputation: 3013

As per the discussion in the comments, the input file is actually ordered by timestamps in ascending order, and you want to match on the IP.

$ cat input.txt 
 Aug  1 16:23:05 serverA dhcpd: Service A OPTION-82 | IP =192.168.1.100 | MAC=70:73:cb:b3:3c:58 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 27 16:37:46 serverA dhcpd: Service A OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 27 16:37:46 serverB dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
$ perl -ne '/\bIP\s*=\s*([\d.]+)\b/||next;$x{$1}=$_}{print $x{$_} for sort keys %x' input.txt 
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

Note: sort keys %x isn't perfect, as it will sort the lines alphabetically. If you need the same order as in the original file, please specify, and as I said in the comments, show a more representative sample of input (and output) data. See also Minimal, Complete, and Verifiable Example.

Upvotes: 3

Related Questions