Reputation: 11
I searched and found very similar problems but unfortunately, none of them worked for my large dataset. What I want to do is to compare fileA
and fileB and write out the matching lines in fileB by adding the important information from fileA
.
Here is the fileA:
TCC Reg
TGA Reg
TTG Reg
TAG None
AAA None
and the fileB:
1 GCT 1883127 302868 16.08
2 GGG 1779189 284102 15.97
3 TCC 1309842 217491 16.60
4 TAA 1384070 168924 12.20
5 TAG 892324 140634 15.76
The output file I want to write is :
3 TCC 1309842 217491 16.60 Reg
5 TAG 892324 140634 15.76 None
I have tried grep -f
and awk 'FNR==NR{a[$1];next}($1 in a){print}' fileA fileB > outputfile
seperately but it did not work.
Upvotes: 1
Views: 55
Reputation: 133428
Following awk may also help you in same.
awk 'FNR==NR{a[$2]=$0;next} ($1 in a){print a[$1],$2}' fileB fileA
Output will be as follows.
3 TCC 1309842 217491 16.60 Reg
5 TAG 892324 140634 15.76 None
EDIT: Adding non-one liner form of solution along with explanation too now.
awk '
FNR==NR{ ##Checking condition here if FNR(awk out of box variable) and NR(awk out of the box variable) values are equal.
##Both FNR and NR indicates the number of lines, only difference between them is that FNR value get RESET whenever a new Input_file started reading.
##On other end NR value will be keep increasing till al the Input_file(s) are read. So this condition will be TRUE only when very first Input_file
##is being read.
a[$2]=$0;##Creating an array here named a whose index is $2(second field) of current line of file named fileB and keeping its value as current line value.
next ##next is awk out of the box variable which will skpi all further statements for the current line.
}
($1 in a){ ##Now this condition will be always executed when first Input_file is done with reading and second Input_file is getting read.
##Checking here if $1(first field) of current line of Input_file(fileA) is present in array a, if yes then do following.
print a[$1],$2 ##Printing the value of array a whose index is $1(current line) and $2 of current line as per your requirement.
}
' fileB fileA ##Mentioning the Input_file(s) fileA and fileB here.
Upvotes: 1
Reputation: 67467
awk
to the rescue!
$ awk 'NR==FNR {a[$1]=$2; next}
$2 in a {print $0,a[$2]}' fileA fileB
3 TCC 1309842 217491 16.60 Reg
5 TAG 892324 140634 15.76 None
Upvotes: 1