bash - if two columns match then append column

Question

I am trying to add a column if two prior fields have already been found in a file.

I have a comma delimited file with a large number of entries and I need to find all lines that match on two columns, the second column and the seventh. If both are found on multiple lines then add an eighth column saying "shared".

file contents:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314

output desired:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

I've searched and found this link Awk - matching on 2 columns for differents lines but it doesn't quite do what I need, it only matches on the following line.

I could do something like this:

while IFS=',' read host device blk poolnum porta portb serial

    ldev_count=`cat outputtest.txt | grep -iw $device | grep -iw $serial | wc -l`
    if [[ $ldev_count > 1 ]] ; then
        echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial, SHARED" >> semifinal.txt
    else
        echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial" >> semifinal.txt
    fi
done < outputtest.txt

But it's extremely slow. I'm hoping to find a better solution.

Thanks for any help.

edited for formatting

Juan Diego Godoy Robles · Accepted Answer

You might need this:

awk -F\, 'NR==FNR{a[$2]++;b[$7]++;next}
          a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1' OFS=',' file file

Result:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

Explanation

We're going to iterate the file two times:

First: NR==FNR{a[$2]++;b[$7]++;next}

We get the repetitions of each column and store it in a and b arrays.

Second: a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1

To filter the lines that match the number of reps you expect, this number must be greater than one for both columns in order to add a new ending column :$(NF+1)="shared".

Note: 1 is just a shortcut to avoid using the print statement.

bash - if two columns match then append column

Answers (2)

Related Questions