peacemaker
peacemaker

Reputation: 71

Sort Logfile for unqiue SRC + DST IPs

I would like to sort my Logfile (~5 GB) for unique connection events. Unique (SRC_IP + DST_IP) only - but with timestamps and the other informations.


Example:

1    Feb 5 14:59:00 initf="eth0" outift="eth1" srcip="192.168.0.2" dstip="10.10.10.2"...
2    Feb 5 14:59:00 initf="eth0" outift="eth1" srcip="192.168.0.1" dstip="10.10.10.2"...
3    Feb 5 14:59:00 initf="eth0" outift="eth1" srcip="192.168.0.2" dstip="10.10.10.1"...
4    Feb 5 14:59:00 initf="eth0" outift="eth1" srcip="192.168.0.2" dstip="10.10.10.2"...
5    Feb 5 14:59:00 initf="eth0" outift="eth1" srcip="192.168.0.2" dstip="10.10.10.2"...

The output events should be:

1    Feb 5 14:59:00 initf="eth0" outift="eth1" srcip="192.168.0.2" dstip="10.10.10.2"...
2    Feb 5 14:59:00 initf="eth0" outift="eth1" srcip="192.168.0.1" dstip="10.10.10.2"...
3    Feb 5 14:59:00 initf="eth0" outift="eth1" srcip="192.168.0.2" dstip="10.10.10.1"...

because the combination of src + dst IP is unique. I tried this with sort -uk column but it doesn't work as intended. Also the column of src + dst IP are not consistent. It switches sometimes, because depending on the out-interface, the dstmac is submitted or not.

Maybe an AWK script could do the trick ?

EDIT

Since Karakfa made a good suggestion, solving this with awk - I am currently trying to change [$7,$8] into a regex

awk '!a[regexpression for src ip, regexpression for dst ip]++' file

Upvotes: 0

Views: 26

Answers (1)

karakfa
karakfa

Reputation: 67467

assuming no spaces in the first 8 field values, this will give you the first appearance of the combination of the key.

$ awk '!a[$7,$8]++' file

This doesn't require sorted input (and won't change the order itself), you can pipe this into sort with your desired order. If the field order is not fixed, you can do something like this:

$ awk '{for(i=1;i<=NF;i++) if($i~/^srcip=/) s=$i; else if($i~/^dstip=/) d=$i}
       !a[s,d]++;
       {s=d=""}' file

Note that records with missing fields will be grouped as well. You may want to print all of those individually.

Upvotes: 1

Related Questions