zoe
zoe

Reputation: 311

awk multiple field seperators?

I have a large file with lines like so

chr1    HAVANA  gene    11869   14409   .       +       .       gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";

I want to extract ENSG00000223972.5, DDX11L1, chr1, 11869 and 14409. I have succeeded in the first two by:

awk 'BEGIN {FS="\""}; {print $2"\t"$6}' file.txt

I'm struggling to now extract the chr1, 11869 and 14409 as this will need a different feild seperator? How is this done on the same ;line??

Upvotes: 0

Views: 74

Answers (2)

Ed Morton
Ed Morton

Reputation: 203512

$ awk -F'[ "]+' -v OFS='\t' '{print $1, $4, $5, $10, $16}' file
chr1    11869   14409   ENSG00000223972.5       DDX11L1

Upvotes: 1

CWLiu
CWLiu

Reputation: 4043

Try to use following command to extract what you want,

awk 'BEGIN {FS="\"";OFS="\t"}; {split($1,a,/[\ ]*/); print a[1],a[4],a[5],$2,$6}' file.txt

Brief explanation,

  • split($1,a,/[\ ]*/: split $1 into the array a, and the separators would be regex /[\ ]*/
  • Print the split content stored in a as you desired.

Upvotes: 1

Related Questions