Using SED to replace capture group with regex pattern

Question

I need some help with a sed command that I thought would help solve an issue I have. I have basically have long text files that look something like this:

>TRINITY_DN112253_co_g1_i2 Len=3873 path=[38000:0-183]
ACTCACGCCCACATAAT

The ACT text blocks continue on, and then there are more blocks of text that follow the same pattern, except the text after the > differs slightly by numbers. I want to replace only this header part (the part followed by the >) to everything up until the very last “_” the sed command I thought seemed logical is the following:

sed -i ‘s/>.*/TRINITY.*_/‘

However, sed is literally changing each header to TRINITY.*_ rather than capturing the block I thought it would. Any help is appreciated!

(Also.. just to make things clear, I thought that my sed command would convert the top header block into this:

>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT

Cyrus · Accepted Answer

This might help:

sed '/^>/s/[^_]*$//' file

Output:

>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT

See: The Stack Overflow Regular Expressions FAQ

Using SED to replace capture group with regex pattern

Answers (1)

Related Questions