shenTTT
shenTTT

Reputation: 57

How to add double quotes in a specific column

How to add double quotes to the gene_id?

My original file:

##gtf-version 3
Bany_Scaf1  maker   gene    201136  207903  .   +   .   Alias "maker-Bany_Scaf1-snap-gene-2.23"; Dbxref "InterPro:IPR019774" "Pfam:PF00351"; ID Bany_03723; Name Bany_03723; Ontology_term "GO:0016714" "GO:0055114"; gene_id Bany_03723
Bany_Scaf1  maker   transcript  201136  207903  .   +   .   Alias "maker-Bany_Scaf1-snap-gene-2.23-mRNA-1"; Dbxref "InterPro:IPR019774" "Pfam:PF00351"; ID "Bany_03723-RA"; Name "Bany_03723-RA"; Ontology_term "GO:0016714" "GO:0055114"; Parent Bany_03723; _AED "0.06"; _QI "45|1|1|1|1|1|7|425|530"; _eAED "0.06"; gene_id Bany_03723; original_biotype mrna; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    201136  201304  .   +   .   ID "Bany_03723-RA:1"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    202687  202770  .   +   .   ID "Bany_03723-RA:2"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    202886  202921  .   +   .   ID "Bany_03723-RA:3"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    203004  203820  .   +   .   ID "Bany_03723-RA:4"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    206097  206223  .   +   .   ID "Bany_03723-RA:5"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    206649  206878  .   +   .   ID "Bany_03723-RA:6"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    207304  207903  .   +   .   ID "Bany_03723-RA:7"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 

I hope to change all gene_id Bany_xxxxxx to gene_id "Bany_xxxxxx".

I tried this:

sed -E 's#(Parent|gene_id|ID) ([0-9A-Za-z.]+)#\1 \"\2\"#g'

But the double quotes were added in the wrong place, like:

gene_id "Bany"_03723

What should I do...

Upvotes: 0

Views: 93

Answers (1)

karakfa
karakfa

Reputation: 67567

with sed

$ sed -E 's/gene_id ([^;]+)/gene_id "\1"/' file

find the next word delimited with ; to gene_id and quote it. Assumes space between them. If tab change to \t

Upvotes: 3

Related Questions