Reputation: 3
I have two files. I would like to replace a certain string in file 1, with the contents of file 2 based on a common string.
file 1
Chr5 psl2gff exon 15907715 15907933 . + . NM_001046410
Chr2 psl2gff exon 8898358 8898394 . + . NM_001192190
file 2
NM_001046410 gene_id TUBA1D; transcript_id tubulin, alpha 3d
NM_001192190 gene_id BOD1L1; transcript_id biorientation of chromosomes in cell division 1 like 1
output
Chr5 psl2gff exon 15907715 15907933 . + . gene_id TUBA1D; transcript_id tubulin, alpha 3d
Chr2 psl2gff exon 8898358 8898394 . + . gene_id BOD1L1; transcript_id biorientation of chromosomes in cell division 1 like 1
in file 1 there are multiple instances of the same string, however, file 2 only has it once. I would like all instances of the NM_**** etc. to be replaced by the contents of file 2 when the first column matches. following this, I would like to completely remove the NM_**** from the file.
I am very new to bash etc. I have looked all over the place for a way to do this, but none so far have worked. Also, there are over 5000 lines in file 2, many more in file 1.
Any help would be much appreciated!
Thanks.
Upvotes: 0
Views: 224
Reputation: 67467
this is a join
operation. If the files are sorted on the join key, and if the white space is not significant the easiest will be
$ join -19 -21 file1 file2 | cut -d' ' -f2-
Chr5 psl2gff exon 15907715 15907933 . + . gene_id TUBA1D; transcript_id tubulin, alpha 3d
Chr2 psl2gff exon 8898358 8898394 . + . gene_id BOD1L1; transcript_id biorientation of chromosomes in cell division 1 like 1
if the files are not sorted and white space is important awk
will be a better solution
$ awk 'NR==FNR {k=$1; $1=""; a[k]=$0; next}
$NF in a {sub(FS $NF"$",a[$NF])}1' file2 file1
Chr5 psl2gff exon 15907715 15907933 . + . gene_id TUBA1D; transcript_id tubulin, alpha 3d
Chr2 psl2gff exon 8898358 8898394 . + . gene_id BOD1L1; transcript_id biorientation of chromosomes in cell division 1 like 1
exercise for you is to understand the code. There are many examples (>100) on this site exactly for this question and with many commented scripts, some of which are written by me.
Upvotes: 1