27560
27560

Reputation: 93

Use AWK to compare two files, and perform conditional statements before printing

I am an AWK-ing novice, and this is, by far, my most complex AWK attempt to date. I have 2 files, one with scan data (FILE1.csv) and another with scan dates (FILE2.csv). I need to compare these 2 files (extracting the dates from FILE2) then with those extracted dates, I need to conditionally check for the correct date, based on which dates are present for the particular target. My current script output yields no results. Any help is greatly appreciated!

FILE1.csv

Name,Plugin,Plugin Name,First Discovered,Last Observed,Severity
server1.domain,57608,SMB Signing not required,9/19/2020 20:55,12/3/2022 20:39,Medium
server1.domain,71966,Oracle Java SE Multiple Vulnerabilities (January 2014 CPU),4/22/2021 3:08,12/1/2022 3:14,Critical
server1.domain,94138,Oracle Java SE Multiple Vulnerabilities (October 2016 CPU),4/22/2021 3:08,12/8/2022 3:14,Critical
server2.domain,156032,Apache Log4j Unsupported Version Detection,12/25/2021 3:07,12/8/2022 3:07,Critical
server2.domain,156032,Apache Log4j Unsupported Version Detection,8/31/2022 11:48,11/30/2022 10:16,Critical
server2.domain,156103,Apache Log4j 1.2 JMSAppender Remote Code Execution (CVE-2021-4104),12/25/2021 3:07,12/6/2022 3:07,High
server3.domain,164078,Splunk Enterprise and Universal Forwarder < 9.0 Improper Certificate Validation,10/31/2022 3:13,11/30/2022 10:16,High
server3.domain,166960,Tenable Nessus Agent 10.x < 10.2.1 Multiple Vulnerabilities (TNS-2022-22),11/7/2022 3:14,12/3/2022 3:14,High
server3.domain,168362,VMware Tools 10.x / 11.x / 12.x < 12.1.5 DoS (VMSA-2022-0029),12/5/2022 3:14,12/8/2022 3:14,Low

FILE2.csv

Name,LAST_VULN_AGENT_SCAN,LAST_VULN_NONCRED_SCAN,LAST_VULN_CRED_SCAN
server1.domain,12/8/2022 3:14,12/3/2022 20:39,
server2.domain,,12/8/2022 3:07,
server3.domain,,12/3/2022 3:14,12/8/2022 3:14

DESIRED OUTPUT

Name,Plugin,Plugin Name,First Discovered,Last Observed,Severity
server1.domain,94138,Oracle Java SE Multiple Vulnerabilities (October 2016 CPU),4/22/2021 3:08,12/8/2022 3:14,Critical
server2.domain,156032,Apache Log4j Unsupported Version Detection,12/25/2021 3:07,12/8/2022 3:07,Critical
server3.domain,168362,VMware Tools 10.x / 11.x / 12.x < 12.1.5 DoS (VMSA-2022-0029),12/5/2022 3:14,12/8/2022 3:14,Low

CURRENT SCRIPT

awk -F',' 'NR==FNR{a[$2,$3,$4];next} 
    if (a[$2] && a[$4]) {
        if(a[$2] > a[$4]) {
            if ($5 == a[$2])
            print $0;
        }
        else {
            if ($5 == a[$4])
            print $0;
        }
    }
    else if (a[$2]) {
        if ($5 == a[$2])
        print $0;
    }
    else if (a[$4]) {
        if ($5 == a[$4])
        print $0;
    }
    else {
        if ($5 == a[$3])
        print $0;
    }' FILE1.csv FILE2.csv

Edit 1: Here is my if/then logic to help understand what I'm doing

if [ ! -z ${LAST_VULN_AGENT_SCAN} ] && [ ! -z ${LAST_VULN_CRED_SCAN} ]; then
    # AGENT SCAN DATE AND CRED SCAN DATE ARE NOT NULL
    if [ "${AGENT_EPOCH}" -gt "${CRED_EPOCH}" ]; then
        # AGENT SCAN DATE IS MORE RECENT THAN CRED SCAN DATE
        # USE AGENT SCAN DATE TO FILTER COLUMN 5 (Last Observed)
    else
        # CRED SCAN DATE IS MORE RECENT THAN AGENT SCAN DATE
        # USE CRED SCAN DATE TO FILTER COLUMN 5 (Last Observed)
    fi
elif [ ! -z ${LAST_VULN_AGENT_SCAN} ] && [ -z ${LAST_VULN_CRED_SCAN} ]; then
    # AGENT SCAN DATE IS NOT NULL AND CRED SCAN DATE IS NULL
    # USE AGENT SCAN DATE TO FILTER COLUMN 5 (Last Observed)
elif [ -z ${LAST_VULN_AGENT_SCAN} ] && [ ! -z ${LAST_VULN_CRED_SCAN} ]; then
    # CRED SCAN DATE IS NOT NULL AND AGENT SCAN DATE IS NULL
    # USE CRED SCAN DATE TO FILTER COLUMN 5 (Last Observed)
else
    # USE NONCRED SCAN DATE TO FILTER COLUMN 5 (Last Observed)
fi

Upvotes: 0

Views: 131

Answers (1)

markp-fuso
markp-fuso

Reputation: 34084

A few issues with OP's current code:

  • while processing the 1st file 3 columns are used as the index for the a[] array (a[$2,$3,$4]) but ...
  • during processing of the 2nd file only 1 column is ever used for referencing the a[] array; net result is that none of the tests will evaluate as 'true'
  • I'm assuming that all relationships between FILE1.csv and FILE2.csv are based on a common Name (eg, server1.domain) so there needs to be some sort of comparison of $1 between the two files; more specifically, array indices should probably be based on $1
  • during processing of the 2nd file we have to first test to see if an array entry exists before we try referencing it otherwise ... if the array doesn't exist we'll actually create a new array entry when trying to reference it; likely not an issue with this particular process but better to understand, and fix, this issue now than to continue with this coding style and end up with unexpected results with later awk scripts

Additional items we need to address:

  • since we'll be dealing with 3 different dates we'll look at using a separate array for managing the dates; to keep with OP's column references we'll call them dt2[] (re $2), dt3[] (re: $3) and dt4[] (re: $4)
  • comparing 2 dates is easier if we convert to epoch and them compare the epoch values; if $2 > $4 we'll create an entry in the greater[] array

Pulling all of this together into our awk script:

awk '
BEGIN   { FS=OFS="," }
FNR==NR { if (FNR==1)
             next

          if ($2) dt2[$1]=$2
          if ($3) dt3[$1]=$3
          if ($4) dt4[$1]=$4

          if ($2 && $4) {
             split($2,a,"[ /:]")
             epoch2=mktime(a[3] " " a[1] " " a[2] " " a[4] " " a[5] " 0")

             split($4,a,"[ /:]")
             epoch4=mktime(a[3] " " a[1] " " a[2] " " a[4] " " a[5] " 0")

             if (epoch2 > epoch4)
                greater[$1]
          }
          next
        }
FNR==1  { printme=1 }                                  # set print flag
FNR>1   { printme=0                                    # clear print flag

          if ($1 in dt2 && $1 in dt4) {
              if ($1 in greater) {
                 if ($5 == dt2[$1])
                    printme=1
              }
              else if ($5 == dt4[$1]) {
                 printme=1
              }
           }
           else if ($1 in dt2) {
              if ($5 == dt2[$1])
                 printme=1
           }
           else if ($1 in dt4) {
              if ($5 == dt4[$1])
                 printme=1
           }
           else if ($1 in dt3) {
              if ($5 == dt3[$1])
                 printme=1
           }
        }
printme                                                # if print flag == 1 then print current line to stdout
' FILE2.csv FILE1.csv

This generates:

Name,Plugin,Plugin Name,First Discovered,Last Observed,Severity
server1.domain,94138,Oracle Java SE Multiple Vulnerabilities (October 2016 CPU),4/22/2021 3:08,12/8/2022 3:14,Critical
server2.domain,156032,Apache Log4j Unsupported Version Detection,12/25/2021 3:07,12/8/2022 3:07,Critical
server3.domain,168362,VMware Tools 10.x / 11.x / 12.x < 12.1.5 DoS (VMSA-2022-0029),12/5/2022 3:14,12/8/2022 3:14,Low

Upvotes: 1

Related Questions