Reputation: 11

Remove Rows From CSV Where A Specific Column Matches An Input File

I have a CSV that contains multiple columns and rows [File1.csv].

I have another CSV file (just one column) that lists a specific words [File2.csv].

I want to able to take remove rows within File1 if any columns match any of the words listed in File2.

I originally used this:

 grep -v -F -f File2.csv File1.csv > File3.csv

This worked, to a certain extent. This issue I ran into was with columns that had more than word in it (ex. word1,word2,word3). File2 contained word2 but did not delete that row.

I tired spreading the words apart to look like this: (word1 , word2 , word3), but the original command did not work.

How can I remove a row that contains a word from File2 and may have other words in it?

Upvotes: 0

Answers (2)

Birei

Reputation: 36282

One way using awk.

Content of script.awk:

BEGIN {
    ## Split line with a doble quote surrounded with spaces.
    FS = "[ ]*\"[ ]*"
}

## File with words, save them in a hash.
FNR == NR {
    words[ $2 ] = 1;
    next;
}

## File with multiple columns.
FNR < NR {
    ## Omit line if eigth field has no interesting value or is first line of
    ## the file (header).
    if ( $8 == "N/A" || FNR == 1 ) {
        print $0
        next
    }

    ## Split interested field with commas. Traverse it searching for a
    ## word saved from first file. Print line only if not found.

    ## Change due to an error pointed out in comments.
    ##--> split( $8, array, /[ ]*,[ ]*/ )
    ##--> for ( i = 1; i <= length( array ); i++ ) {
    len = split( $8, array, /[ ]*,[ ]*/ )
    for ( i = 1; i <= len; i++ ) {
    ## END change.

        if ( array[ i ] in words ) {
            found = 1
            break
        }
    }
    if ( ! found ) {
        print $0
    }
    found = 0
}

Assuming File1.csv and File2.csv have content provided in comments of Thor's answer (I suggest to add that information to the question), run the script like:

awk -f script.awk File2.csv File1.csv

With following output:

"DNSName","IP","OS","CVE","Name","Risk"
"ex.example.com","1.2.3.4","Linux","N/A","HTTP 1.1 Protocol Detected","Information"
"ex.example.com","1.2.3.4","Linux","CVE-2011-3048","LibPNG Memory Corruption Vulnerability (20120329) - RHEL5","High"
"ex.example.com","1.2.3.4","Linux","CVE-2012-2141","Net-SNMP Denial of Service (Zero-Day) - RHEL5","Medium"
"ex.example.com","1.2.3.4","Linux","N/A","Web Application index.php?s=-badrow Detected","High"
"ex.example.com","1.2.3.4","Linux","CVE-1999-0662","Apache HTTPD Server Version Out Of Date","High"
"ex.example.com","1.2.3.4","Linux","CVE-1999-0662","PHP Unsupported Version Detected","High"
"ex.example.com","1.2.3.4","Linux","N/A","HBSS Common Management Agent - UNIX/Linux","High"

Upvotes: 1

Thor

Reputation: 47219

You could convert split lines containing multiple patterns in File2.csv.

Below uses tr to convert lines containing word1,word2 into separate lines before using them as patterns. The <() construct temporarily acts as a file/fifo (tested in bash):

grep -v -F -f <(tr ',' '\n' < File2.csv) File1.csv > File3.csv

Upvotes: 0

Remove Rows From CSV Where A Specific Column Matches An Input File

Answers (2)

Related Questions