Reputation: 1403

Compare multiple Columns and Append the result into another file

I have two files file1 and file2, Both the files have 5 columns. I want to compare first 4 columns of file1 with file2.

If they are equal, need to compare the 5th column. If 5th column values are different, need to print the file1's 5th column as file2's 6th column.

I have used below awk to compare two columns in two different files, but how to compare multiple columns and append the particular column in another file if matches found?

awk -F, 'NR==FNR{_1[$1]++;next}!_1[$1]'

file1:

111,item1,garde1,wing1,maingroup
123,item3,grade5,wing10,topcat
132,item2,grade3,wing7,middlecat
134,item2,grade3,wing7,middlecat
177,item8,gradeA,wing11,lowcat

file2:

111,item1,garde1,wing1,maingroup
123,item3,grade5,wing10,lowcat
132,item3,grade3,wing7,middlecat
126,item2,grade3,wing7,maingroup
177,item8,gradeA,wing11,lowcat

Desired output:

123,item3,grade5,wing10,lowcat,topcat

Upvotes: 0

Answers (2)

Ed Morton

Reputation: 203209

$ cat tst.awk
BEGIN { FS=OFS="," }
{ key = $0; sub("(,[^,]*){"NF-4"}$","",key) }
NR==FNR { file1[key] = $5; next }
(key in file1) && ($5 != file1[key]) {
    print $0, file1[key]
}

$ awk -f tst.awk file1 file2
123,item3,grade5,wing10,lowcat,topcat

Upvotes: 0

jas

Reputation: 10865

Awk can simulate multidimensional arrays by sequencing the indices. Underneath the indices are concatenated using the built-in SUBSEP variable as a separator:

 $ awk -F, -v OFS=, 'NR==FNR { a[$1,$2,$3,$4]=$5; next } a[$1,$2,$3,$4] && a[$1,$2,$3,$4] != $5 { print $0,a[$1,$2,$3,$4] }' file1.txt file2.txt

123,item3,grade5,wing10,lowcat,topcat

awk -F, -v OFS=,

Set both input and output separators to ,

NR==FNR { a[$1,$2,$3,$4]=$5; next }

Create an associative array from the first file relating the first four fields of each line to the fifth. When using a comma-separated list of values as an index, awk actually concatenates them using the value of the built-in SUBSEP variable as a separator. This is awk's way of simulating multidimensional arrays with a single subscript. You can set SUBSEP to any value you like but the default, which is a non-printing character unlikely to appear in the data, is usually fine. (You can also just do the trick yourself, something like a[$1 "|" $2 "|" $3 "|" $4], assuming you know that your data contains no vertical bars.)

a[$1,$2,$3,$4] && a[$1,$2,$3,$4] != $5 { print $0,a[$1,$2,$3,$4] }

Arriving here, we know we are looking at the second file. If the first four fields were found in the first file, and the $5 from the first file is different than the $5 in the second, print the line from the second file followed by the $5 from the first. (I am assuming here that no $5 from the first file will have a value that evaluates to false, such as 0 or empty.)

Upvotes: 1

Compare multiple Columns and Append the result into another file

Answers (2)

Related Questions