AVA
AVA

Reputation: 101

Adding a new column to a file by matching a column from another file in linux

I have looked for solution to this problem, I found many solutions close to this but being a begginer in the linux, not able to work out this issue.

I have a tab separated file that looks like:

file 1

chr10 chr10_13254_G_A 13254 A G 3320 0.0595045 395.11
chr10 chr10_13398_C_G 13398 G C 3320 0.0226898 150.66
chr10 chr10_13505_C_G 13505 G C 3320 0.0225377 149.65

file 2

chr10_13254_G_A 0.61184
chr10_13398_C_G 0.421707
chr10_13505_C_G 0.35884

I would like to match the column 2 of file 1 with column 1 of the file 2 so as to add the column 2 of the file 2 to file 1 and output a file 3 that looks like:

chr10 chr10_13254_G_A 13254 A G 3320 0.0595045 395.11 0.61184
chr10 chr10_13398_C_G 13398 G C 3320 0.0226898 150.66 0.421707
chr10 chr10_13505_C_G 13505 G C 3320 0.0225377 149.65 0.35884

I tried the below code:

awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$2]}' file1 file2 > file3

Upvotes: 1

Views: 1266

Answers (2)

James Brown
James Brown

Reputation: 37404

join would work here as well:

$ join -1 2 -2 1 file1 file2

Output:

chr10_13254_G_A chr10 13254 A G 3320 0.0595045 395.11 0.61184
chr10_13398_C_G chr10 13398 G C 3320 0.0226898 150.66 0.421707
chr10_13505_C_G chr10 13505 G C 3320 0.0225377 149.65 0.35884

man join:

For  each  pair of input lines with identical join fields, write a line
to standard output.  The default join field is the first, delimited  by
blanks.
- - 
-1 FIELD
       join on this FIELD of file 1
- -
Important:  FILE1  and  FILE2 must be sorted on the join fields.

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133518

With your shown samples, please try following awk code.

awk 'FNR==NR{arr[$1]=$2;next} ($2 in arr){print $0,arr[$2]}' file2 file1

Explanation: Adding detailed explanation for above.

awk '                ##Starting awk program from here.
FNR==NR{             ##Checking condition FNR==NR which will be TRUE for file2.
  arr[$1]=$2         ##Creating array with index of $1 and value of $2 here.
  next               ##next will skip all further statements from here.
}
($2 in arr){         ##Checking condition if 2nd field is present in arr then do following.
  print $0,arr[$2]   ##Printing current line then value of array with index of $2.
}
' file2 file1        ##Mentioning Input_file names here.

Upvotes: 2

Related Questions