justaguy
justaguy

Reputation: 3022

using awk to parse a specific condition

I am trying to use awkto parse multiple conditions and having some trouble with the first. I think the code below is close, but it does not return the desired output. The parse rules are: Thank you :).

  1. 4 zeros after the NC_ (not always the case) and the digits before the .
  2. g. ### g.###
  3. c
  4. t

    awk -F"[_.>]" 'FNR>1 {X=$4+0; sub(X, "", $4); print $2+0, X, X, $4, $5}' OFS="\t" ${id}_position.txt > ${id}_parse.txt

id_position.txt

Input Variant   Errors  Chromosomal Variant Coding Variant(s)
NM_004004.5:c.79G>A     NC_000013.10:g.20763642C>T  NM_004004.5:c.79G>A XM_005266354.1:c.79G>A  XM_005266355.1:c.79G>A  XM_005266356.1:c.79G>A

Desired output:

13     20763642     20763642     C     T

Upvotes: 0

Views: 85

Answers (1)

Jotne
Jotne

Reputation: 41460

This should do:

awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" file
13      20763642        20763642        C       T

Upvotes: 0

Related Questions