on9jai
on9jai

Reputation: 41

Formatting FASTA files

I have been looking for a way to format FASTA files using Bash commands from

gi|723654225|ref|XP_010314935.1| PREDICTED: F-box/kelch-repeat protein At1g55270-like [Solanum lycopersicum]
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP

to

F-box/kelch-repeat protein At1g55270-like
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP

How would I do that in Bash?

Upvotes: 1

Views: 91

Answers (1)

Cyrus
Cyrus

Reputation: 88674

Try this:

awk '/F-box/ {$0=$3" "$4" "$5} {print}' file

With this file:

gi|723654225|ref|XP_010314935.1| PREDICTED: F-box/kelch-repeat protein At1g55270-like [Solanum lycopersicum]
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP

Output:

F-box/kelch-repeat protein At1g55270-like
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP

Upvotes: 1

Related Questions