Reputation: 99
I am looking for a way to simply change the order of chunks of a sequence header name in UNIX.
The sequence name I have now is:
>PIENAPT00000000258_pienapg00000000172
But I need:
>pienapg00000000172_PIENAPT00000000258
I tried to accomplish this with sed, but realised this is not the way to go. But unfortunately did not find a solution yet. Can someone help me?
sed -E 's/PIENAPT.+\_pienapg.+/pienapg.+\_PIENAPT.+//'
Upvotes: 1
Views: 65
Reputation: 133528
Could you please try following with sed
specifically.
sed 's/\([^_]*\)\(_\)\(.*\)/\3\2\1/' Input_file
OR in case you have >
in starting of your Input_file's lines which was NOT clear because of quote tags try following then.
sed 's/\(^>\)\([^_]*\)\(_\)\(.*\)/\1\4\3\2/' Input_file
OR as per @potong sir's comment try:
sed -E 's/>(.*)_(.*)/>\2_\1/' Input_file
Trust me this should be pretty easy with awk
try following if you are ok with it.
awk 'BEGIN{FS=OFS="_"} {print $2,$1}' Input_file
OR in case you have >
in starting of your Input_file's lines which was NOT clear because of quote tags try following then.
awk 'BEGIN{FS=OFS="_"} {print substr($1,1,1)$2,substr($1,2)}' Input_file
Upvotes: 3
Reputation: 88646
Use >
and _
as filed separator with awk:
awk 'BEGIN{FS="[>_]"}{print ">" $3 "_" $2}' file
Output:
>pienapg00000000172_PIENAPT00000000258
Upvotes: 2