Reputation: 909
I am trying to compare two files and append another column if there is certain condition satisfied.
file1.txt
1 101 111 . BCX 123
1 298 306 . CCC 234
1 299 305 . DDD 345
file2.txt
1 101 111 BCX P1@QQQ
1 299 305 DDD P2@WWW
The output should be:
1 101 111 . BCX 123;P1@QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2@WWW
What I can do is, to only do this for the lines having a match:
awk 'NR==FNR{ a[$1,$2,$3,$4]=$5; next }{ s=SUBSEP; k=$1 s $2 s $3 s $5 }k in a{ print $0,a[k] }' file2.txt file1.txt
1 101 111 . BCX 123 P1@QQQ
1 299 305 . DDD 345 P2@WWW
But then, I am missing the second line in file1.
How can I still keep it even though there is no match with file2 regions?
Upvotes: 3
Views: 104
Reputation: 67507
another awk
$ awk ' {t=5-(NR==FNR); k=$1 FS $2 FS $3 FS $t}
NR==FNR {a[k]=$NF; next}
k in a {$0=$0 ";" a[k]}1' file2 file1
1 101 111 . BCX 123;P1@QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2@WWW
last component of the key is either 4th or 5th field based on first or second file input; set it accordingly and use a single k
variable in the script. Note that
t=5-(NR==FNR)
can be written as conventionally,
t=NR==FNR?4:5
Upvotes: 0
Reputation: 133538
Following awk
may help you in same.
awk 'FNR==NR{a[$1,$2,$3,$4]=$NF;next} (($1,$2,$3,$5) in a){print $0";"a[$1,$2,$3,$5];next} 1' file2.txt file1.txt
Output will be as follows.
1 101 111 . BCX 123;P1@QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2@WWW
Upvotes: 1
Reputation: 46856
If you want to print every line, you need your print command not to be limited by your condition.
awk '
NR==FNR {
a[$1,$2,$3,$4]=$5; next
}
{
s=SUBSEP; k=$1 s $2 s $3 s $5
}
k in a {
$6=$6 ";" a[k]
}
1' file2.txt file1.txt
The 1
is shorthand that says "print every line". It's a condition (without command statements) that always evaluates "true".
The k in a
condition simply replaces your existing 6th field with the concatenated one. If the condition is not met, the replacement doesn't happen, but we still print because of the 1
.
Upvotes: 3