T_R
T_R

Reputation: 99

How to flip the order of sequence header names using specific 'middle point'

I am looking for a way to simply change the order of chunks of a sequence header name in UNIX.

The sequence name I have now is:

>PIENAPT00000000258_pienapg00000000172

But I need:

>pienapg00000000172_PIENAPT00000000258

I tried to accomplish this with sed, but realised this is not the way to go. But unfortunately did not find a solution yet. Can someone help me?

sed -E 's/PIENAPT.+\_pienapg.+/pienapg.+\_PIENAPT.+//' 

Upvotes: 1

Views: 65

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133528

Could you please try following with sed specifically.

sed 's/\([^_]*\)\(_\)\(.*\)/\3\2\1/'  Input_file

OR in case you have > in starting of your Input_file's lines which was NOT clear because of quote tags try following then.

sed 's/\(^>\)\([^_]*\)\(_\)\(.*\)/\1\4\3\2/'  Input_file

OR as per @potong sir's comment try:

sed -E 's/>(.*)_(.*)/>\2_\1/'  Input_file


Trust me this should be pretty easy with awk try following if you are ok with it.

awk 'BEGIN{FS=OFS="_"} {print $2,$1}'  Input_file

OR in case you have > in starting of your Input_file's lines which was NOT clear because of quote tags try following then.

awk 'BEGIN{FS=OFS="_"} {print substr($1,1,1)$2,substr($1,2)}' Input_file

Upvotes: 3

Cyrus
Cyrus

Reputation: 88646

Use > and _ as filed separator with awk:

awk 'BEGIN{FS="[>_]"}{print ">" $3 "_" $2}' file

Output:

>pienapg00000000172_PIENAPT00000000258

Upvotes: 2

Related Questions