Luke Fowler
Luke Fowler

Reputation: 93

bash - if two columns match then append column

I am trying to add a column if two prior fields have already been found in a file.

I have a comma delimited file with a large number of entries and I need to find all lines that match on two columns, the second column and the seventh. If both are found on multiple lines then add an eighth column saying "shared".

file contents:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314

output desired:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

I've searched and found this link Awk - matching on 2 columns for differents lines but it doesn't quite do what I need, it only matches on the following line.

I could do something like this:

while IFS=',' read host device blk poolnum porta portb serial

    ldev_count=`cat outputtest.txt | grep -iw $device | grep -iw $serial | wc -l`
    if [[ $ldev_count > 1 ]] ; then
        echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial, SHARED" >> semifinal.txt
    else
        echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial" >> semifinal.txt
    fi
done < outputtest.txt

But it's extremely slow. I'm hoping to find a better solution.

Thanks for any help.

edited for formatting

Upvotes: 1

Views: 978

Answers (2)

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14945

You might need this:

awk -F\, 'NR==FNR{a[$2]++;b[$7]++;next}
          a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1' OFS=',' file file

Result:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

Explanation

We're going to iterate the file two times:

First: NR==FNR{a[$2]++;b[$7]++;next}

We get the repetitions of each column and store it in a and b arrays.

Second: a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1

To filter the lines that match the number of reps you expect, this number must be greater than one for both columns in order to add a new ending column :$(NF+1)="shared".

Note: 1 is just a shortcut to avoid using the print statement.

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following and let me know if this helps you.

awk -F, 'FNR==NR{a[$2,$7]++;next}  a[$2,$7]>1{print $0",shared"}'  Input_file  Input_file

Output will be as follows.

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

EDIT: If you want to print matching lines with string "shared" and non-matching lines simply print then following may help you in same.

awk -F, '           ##Creating field delimiter as comma.
FNR==NR{            ##FNR==NR is a condition which will be TRUE when first Input_file is being read.
  a[$2,$7]++;       ##creating an array named a whose index is $2,$7(second and 7th field) and incrementing its value with 1 each time same elements come.
  next              ##Using next keyword will skip all further statements.
}
a[$2,$7]>1{         ##This condition will be TRUE only when 2nd Input_file is being read, check if array a value in index of $2,$7 is greater than 1.
  print $0",shared" ##Printing the current line with keyword shared at last of line.
  next;
}
1
' Input_file Input_file ##Mentioning the Input_file twice here.

Upvotes: 2

Related Questions