Reputation: 33
I need some help with a sed command that I thought would help solve an issue I have. I have basically have long text files that look something like this:
>TRINITY_DN112253_co_g1_i2 Len=3873 path=[38000:0-183]
ACTCACGCCCACATAAT
The ACT text blocks continue on, and then there are more blocks of text that follow the same pattern, except the text after the > differs slightly by numbers. I want to replace only this header part (the part followed by the >) to everything up until the very last “_” the sed command I thought seemed logical is the following:
sed -i ‘s/>.*/TRINITY.*_/‘
However, sed is literally changing each header to TRINITY.*_ rather than capturing the block I thought it would. Any help is appreciated!
(Also.. just to make things clear, I thought that my sed command would convert the top header block into this:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
Upvotes: 1
Views: 245
Reputation: 88583
This might help:
sed '/^>/s/[^_]*$//' file
Output:
>TRINITY_DN112253_co_g1_ ACTCACGCCCACATAAT
See: The Stack Overflow Regular Expressions FAQ
Upvotes: 1