Katre
Katre

Reputation: 11

Match and write a pattern comparing two files?

I searched and found very similar problems but unfortunately, none of them worked for my large dataset. What I want to do is to compare fileA and fileB and write out the matching lines in fileB by adding the important information from fileA.

Here is the fileA:

TCC    Reg  
TGA    Reg  
TTG    Reg  
TAG    None  
AAA    None

and the fileB:

1       GCT    1883127 302868  16.08  
2       GGG    1779189 284102  15.97  
3       TCC    1309842 217491  16.60  
4       TAA    1384070 168924  12.20  
5       TAG    892324  140634  15.76  

The output file I want to write is :

3       TCC    1309842 217491  16.60  Reg          
5       TAG    892324  140634  15.76  None

I have tried grep -f and awk 'FNR==NR{a[$1];next}($1 in a){print}' fileA fileB > outputfile seperately but it did not work.

Upvotes: 1

Views: 55

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133428

Following awk may also help you in same.

awk 'FNR==NR{a[$2]=$0;next} ($1 in a){print a[$1],$2}' fileB fileA

Output will be as follows.

3       TCC    1309842 217491  16.60   Reg
5       TAG    892324  140634  15.76   None

EDIT: Adding non-one liner form of solution along with explanation too now.

awk '
FNR==NR{ ##Checking condition here if FNR(awk out of box variable) and NR(awk out of the box variable) values are equal.
         ##Both FNR and NR indicates the number of lines, only difference between them is that FNR value get RESET whenever a new Input_file started reading.
         ##On other end NR value will be keep increasing till al the Input_file(s) are read. So this condition will be TRUE only when very first Input_file
         ##is being read.
  a[$2]=$0;##Creating an array here named a whose index is $2(second field) of current line of file named fileB and keeping its value as current line value.
  next     ##next is awk out of the box variable which will skpi all further statements for the current line.
}
($1 in a){ ##Now this condition will be always executed when first Input_file is done with reading and second Input_file is getting read.
           ##Checking here if $1(first field) of current line of Input_file(fileA) is present in array a, if yes then do following.
  print a[$1],$2 ##Printing the value of array a whose index is $1(current line) and $2 of current line as per your requirement.
}
' fileB fileA ##Mentioning the Input_file(s) fileA and fileB here.

Upvotes: 1

karakfa
karakfa

Reputation: 67467

awk to the rescue!

$ awk 'NR==FNR {a[$1]=$2; next} 
       $2 in a {print $0,a[$2]}' fileA fileB

3       TCC    1309842 217491  16.60   Reg
5       TAG    892324  140634  15.76   None

Upvotes: 1

Related Questions