Reputation: 149
I've been searching for a couple of days but I haven't got the right answer
I have two files that look like this:
File1:
>contig-100_23331 length_200 read_count_4043
TCAG...
>contig-100_23332 length_200 read_count_4508
TTCA...
>contig-100_23333 length_200 read_count_184
TTCC...
File2:
>contig-100_23331_Cov:_30.9135
>contig-100_23332_Cov:_125.591
>contig-100_23333_Cov:_5.97537
I want to replace the lines with the names (>contig... length...) in File1 with the lines with the names in File2. Note that File2 contains only the contig names (no sequence).
I suppose theres a way with sed
, but I can't find the solution
Thanks in advance!
Upvotes: 1
Views: 124
Reputation: 753805
One possibility is to use sed
to create a sed
-script from File2
that is then used on File1
:
sed 's/^\(>contig-[0-9]*_[0-9]*\)_.*/s%^\1 %& %/' File2 > sed.script
sed -f sed.script File1 > File.Out
rm -f sed.script
For the sample File2
, the sed.script
would contain:
s%^>contig-100_23331 %>contig-100_23331_Cov:_30.9135 %
s%^>contig-100_23332 %>contig-100_23332_Cov:_125.591 %
s%^>contig-100_23333 %>contig-100_23333_Cov:_5.97537 %
For the sample File1
, the output of the sed
processing would be:
>contig-100_23331_Cov:_30.9135 length_200 read_count_4043
TCAG...
>contig-100_23332_Cov:_125.591 length_200 read_count_4508
TTCA...
>contig-100_23333_Cov:_5.97537 length_200 read_count_184
TTCC...
Some versions of sed
may have problems with 23k lines in the sed
script. If that's a problem for you, then you can generate the sed.script
and then split it (split
) into smaller chunks (e.g. 1000 lines each) and then run sed -f chunk
for each of the chunks. That's painful, but necessary. Historically, HP-UX (archaic versions, like HP-UX 9 or 10) had rather limited versions of sed
that could only handle a few hundred commands in the sed
script.
Given that you're using bash
, you can avoid the explicit intermediate file with process substitution:
sed -f <(sed 's/^\(>contig-[0-9]*_[0-9]*\)_.*/s%^\1 %& %/' File2) File1 > File.Out
However, you should validate the script before using that notation.
Upvotes: 2
Reputation: 1314
DISCLAIMER: Never done this ...
You might want to use the join command to merge the files merging files
You may have to produce an intermediary file or stream for FILE2 which has an extra empty line so that two lines match in both files.
Hope this helps.
Upvotes: 0