Reputation: 93
I am trying to add a column if two prior fields have already been found in a file.
I have a comma delimited file with a large number of entries and I need to find all lines that match on two columns, the second column and the seventh. If both are found on multiple lines then add an eighth column saying "shared".
file contents:
WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314
output desired:
WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared
I've searched and found this link Awk - matching on 2 columns for differents lines but it doesn't quite do what I need, it only matches on the following line.
I could do something like this:
while IFS=',' read host device blk poolnum porta portb serial
ldev_count=`cat outputtest.txt | grep -iw $device | grep -iw $serial | wc -l`
if [[ $ldev_count > 1 ]] ; then
echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial, SHARED" >> semifinal.txt
else
echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial" >> semifinal.txt
fi
done < outputtest.txt
But it's extremely slow. I'm hoping to find a better solution.
Thanks for any help.
edited for formatting
Upvotes: 1
Views: 978
Reputation: 14945
You might need this:
awk -F\, 'NR==FNR{a[$2]++;b[$7]++;next}
a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1' OFS=',' file file
Result:
WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared
Explanation
We're going to iterate the file two times:
First: NR==FNR{a[$2]++;b[$7]++;next}
We get the repetitions of each column and store it in a
and b
arrays.
Second: a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1
To filter the lines that match the number of reps you expect, this number must be greater than one for both columns in order to add a new ending column :$(NF+1)="shared"
.
Note: 1
is just a shortcut to avoid using the print statement.
Upvotes: 3
Reputation: 133428
Could you please try following and let me know if this helps you.
awk -F, 'FNR==NR{a[$2,$7]++;next} a[$2,$7]>1{print $0",shared"}' Input_file Input_file
Output will be as follows.
WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared
EDIT: If you want to print matching lines with string "shared" and non-matching lines simply print then following may help you in same.
awk -F, ' ##Creating field delimiter as comma.
FNR==NR{ ##FNR==NR is a condition which will be TRUE when first Input_file is being read.
a[$2,$7]++; ##creating an array named a whose index is $2,$7(second and 7th field) and incrementing its value with 1 each time same elements come.
next ##Using next keyword will skip all further statements.
}
a[$2,$7]>1{ ##This condition will be TRUE only when 2nd Input_file is being read, check if array a value in index of $2,$7 is greater than 1.
print $0",shared" ##Printing the current line with keyword shared at last of line.
next;
}
1
' Input_file Input_file ##Mentioning the Input_file twice here.
Upvotes: 2