Reputation: 57
How to add double quotes to the gene_id?
My original file:
##gtf-version 3
Bany_Scaf1 maker gene 201136 207903 . + . Alias "maker-Bany_Scaf1-snap-gene-2.23"; Dbxref "InterPro:IPR019774" "Pfam:PF00351"; ID Bany_03723; Name Bany_03723; Ontology_term "GO:0016714" "GO:0055114"; gene_id Bany_03723
Bany_Scaf1 maker transcript 201136 207903 . + . Alias "maker-Bany_Scaf1-snap-gene-2.23-mRNA-1"; Dbxref "InterPro:IPR019774" "Pfam:PF00351"; ID "Bany_03723-RA"; Name "Bany_03723-RA"; Ontology_term "GO:0016714" "GO:0055114"; Parent Bany_03723; _AED "0.06"; _QI "45|1|1|1|1|1|7|425|530"; _eAED "0.06"; gene_id Bany_03723; original_biotype mrna; transcript_id "Bany_03723-RA"
Bany_Scaf1 maker exon 201136 201304 . + . ID "Bany_03723-RA:1"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA"
Bany_Scaf1 maker exon 202687 202770 . + . ID "Bany_03723-RA:2"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA"
Bany_Scaf1 maker exon 202886 202921 . + . ID "Bany_03723-RA:3"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA"
Bany_Scaf1 maker exon 203004 203820 . + . ID "Bany_03723-RA:4"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA"
Bany_Scaf1 maker exon 206097 206223 . + . ID "Bany_03723-RA:5"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA"
Bany_Scaf1 maker exon 206649 206878 . + . ID "Bany_03723-RA:6"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA"
Bany_Scaf1 maker exon 207304 207903 . + . ID "Bany_03723-RA:7"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA"
I hope to change all gene_id Bany_xxxxxx
to gene_id "Bany_xxxxxx"
.
I tried this:
sed -E 's#(Parent|gene_id|ID) ([0-9A-Za-z.]+)#\1 \"\2\"#g'
But the double quotes were added in the wrong place, like:
gene_id "Bany"_03723
What should I do...
Upvotes: 0
Views: 93
Reputation: 67567
with sed
$ sed -E 's/gene_id ([^;]+)/gene_id "\1"/' file
find the next word delimited with ;
to gene_id and quote it. Assumes space between them. If tab change to \t
Upvotes: 3