Sed regex between known word and unknown integer

Question

I can't quite get the regex I need to solve this, so asking the SO wizards for help!

Given:

LOCUS       NODE_96_length_17326_cov_8.76428_ID_1>17327 bp   DNA linear
LOCUS       NODE_97_length_17208_cov_6.56803_ID_1>17208 bp   DNA linear
LOCUS       NODE_98_length_17111_cov_6.60638_ID_1>17111 bp   DNA linear
LOCUS       NODE_99_length_17092_cov_6.7682_ID_19717092 bp   DNA linear
LOCUS       NODE_9_length_59921_cov_8.04963_ID_1759921 bp   DNA linear

I need to replace the string between NODE and the sequence of numbers at the end of that same string. The character preceeding the numbers (e.g. in line 1, 17327) can appear as a > or a _. So basically I need to replace everything from NODE up to and including the last > or _, or match up until a multi-digit integer of unknown length.

Best I'd managed so far was:

sed 's/$NODE.*$$>|_$/newstring/'

But I know this doesn't work.

Just to make it painfully clear, this would be the desired output.

LOCUS       newstring 17327 bp   DNA linear
LOCUS       newstring 17208 bp   DNA linear
LOCUS       newstring 17111 bp   DNA linear
LOCUS       newstring 19717092 bp   DNA linear
LOCUS       newstring 1759921 bp   DNA linear

anubhava · Accepted Answer

You don't need to use any group since you are not using any back-references. You can use:

sed 's/NODE[^[:blank:]]*[_>]/newstring /' file

LOCUS       newstring 17327 bp   DNA linear
LOCUS       newstring 17208 bp   DNA linear
LOCUS       newstring 17111 bp   DNA linear
LOCUS       newstring 19717092 bp   DNA linear
LOCUS       newstring 1759921 bp   DNA linear

Sed regex between known word and unknown integer

Answers (2)

Related Questions