Reputation: 41
I have been looking for a way to format FASTA files using Bash commands from
gi|723654225|ref|XP_010314935.1| PREDICTED: F-box/kelch-repeat protein At1g55270-like [Solanum lycopersicum]
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP
to
F-box/kelch-repeat protein At1g55270-like
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP
How would I do that in Bash?
Upvotes: 1
Views: 91
Reputation: 88674
Try this:
awk '/F-box/ {$0=$3" "$4" "$5} {print}' file
With this file:
gi|723654225|ref|XP_010314935.1| PREDICTED: F-box/kelch-repeat protein At1g55270-like [Solanum lycopersicum] MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP
Output:
F-box/kelch-repeat protein At1g55270-like MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP
Upvotes: 1