Reputation: 101
I have looked for solution to this problem, I found many solutions close to this but being a begginer in the linux, not able to work out this issue.
I have a tab separated file that looks like:
file 1
chr10 chr10_13254_G_A 13254 A G 3320 0.0595045 395.11
chr10 chr10_13398_C_G 13398 G C 3320 0.0226898 150.66
chr10 chr10_13505_C_G 13505 G C 3320 0.0225377 149.65
file 2
chr10_13254_G_A 0.61184
chr10_13398_C_G 0.421707
chr10_13505_C_G 0.35884
I would like to match the column 2 of file 1 with column 1 of the file 2 so as to add the column 2 of the file 2 to file 1 and output a file 3 that looks like:
chr10 chr10_13254_G_A 13254 A G 3320 0.0595045 395.11 0.61184
chr10 chr10_13398_C_G 13398 G C 3320 0.0226898 150.66 0.421707
chr10 chr10_13505_C_G 13505 G C 3320 0.0225377 149.65 0.35884
I tried the below code:
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$2]}' file1 file2 > file3
Upvotes: 1
Views: 1266
Reputation: 37404
join
would work here as well:
$ join -1 2 -2 1 file1 file2
Output:
chr10_13254_G_A chr10 13254 A G 3320 0.0595045 395.11 0.61184
chr10_13398_C_G chr10 13398 G C 3320 0.0226898 150.66 0.421707
chr10_13505_C_G chr10 13505 G C 3320 0.0225377 149.65 0.35884
man join:
For each pair of input lines with identical join fields, write a line
to standard output. The default join field is the first, delimited by
blanks.
- -
-1 FIELD
join on this FIELD of file 1
- -
Important: FILE1 and FILE2 must be sorted on the join fields.
Upvotes: 2
Reputation: 133518
With your shown samples, please try following awk
code.
awk 'FNR==NR{arr[$1]=$2;next} ($2 in arr){print $0,arr[$2]}' file2 file1
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE for file2.
arr[$1]=$2 ##Creating array with index of $1 and value of $2 here.
next ##next will skip all further statements from here.
}
($2 in arr){ ##Checking condition if 2nd field is present in arr then do following.
print $0,arr[$2] ##Printing current line then value of array with index of $2.
}
' file2 file1 ##Mentioning Input_file names here.
Upvotes: 2