CEL
CEL

Reputation: 33

Match columns between files and generate file with combination of data in terminal/powershell/command line Bash

I have two .txt files of different lengths and would like to do the following:

If a value in column 1 of file 1 is present in column 1 of file 3, print column 2 of file 2 and then the whole line that corresponds from file 1.

Have tried permutations of awk however am so far unsuccessful!

Thank you!

File 1:

MARKERNAME EA NEA BETA SE
10:1000706 T C -0.021786390809225 0.519667838651725
1:715265 G C 0.0310128798578049 0.0403763946716293
10:1002042 CCTT C 0.0337857775471699 0.0403300629299562

File 2:

CHR:BP SNP  CHR BP  GENPOS  ALLELE1 ALLELE0 A1FREQ  INFO    
1:715265 rs12184267 1   715265  0.0039411   G   C   0.964671
1:715367 rs12184277 1   715367  0.00394384  A   G   0.964588

Desired File 3:

SNP        MARKERNAME EA NEA BETA SE
rs12184267 1:715265 G C 0.0310128798578049 0.0403763946716293

Attempted:

awk -F'|' 'NR==FNR { a[$1]=1; next } ($1 in a) { print $3, $0 }' file1 file2
awk 'NR==FNR{A[$1]=$2;next}$0 in A{$0=A[$0]}1' file1 file2

Upvotes: 1

Views: 75

Answers (1)

RavinderSingh13
RavinderSingh13

Reputation: 133428

With your shown samples, could you please try following.

awk '
FNR==1{
  if(++count==1){ col=$0 }
  else{ print $2,col }
  next
}
FNR==NR{
  arr[$1]=$0
  next
}
($1 in arr){
  print $2,arr[$1]
}
' file1 file2

Explanation: Adding detailed explanation for above.

awk '                              ##Starting awk program from here.
FNR==1{                            ##Checking condition if this is first line of file(s).
  if(++count==1){ col=$0 }         ##Checking if count is 1 then set col as current line.
  else{ print $2,col }             ##Checking if above is not true then print 2nd field and col here.
  next                             ##next will skip all further statements from here.
}
FNR==NR{                           ##This will be TRUE when file1 is being read.
  arr[$1]=$0                       ##Creating arr with 1st field index and value is current line.
  next                             ##next will skip all further statements from here.
}
($1 in arr){                       ##Checking condition if 1st field present in arr then do following.
  print $2,arr[$1]                 ##Printing 2nd field, arr value here.
}
' file1 file2                      ##Mentioning Input_files name here.

Upvotes: 5

Related Questions