Reputation: 3022
This post in a continuation of:
using awk
to parse specific condition and apologize if I should have added to the thread, should I have added it to that post? I have tried to modify the below awk
script, but with no luck
awk 'NR==2 {
split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));
print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' \
OFS="\t" ${id}_position.txt > ${id}_parse.txt
I have multiple possible condition that a user could input resulting in different output. One of those conditions is in the data sample, with the field in bold needed to be parsed:
` parse rules:
1. 4 zeros after the NC_ (not always the case) and the digits before the .
2. g. ### (before underscore) _### (# after the _)
3. TG (letters after del)
4. - (hyphen used in this spot)`
Data Sample
Input Variant Errors Chromosomal Variant Coding Variant(s)
NM_004004.5:c.575_576delCA **NC_000013.10:g.20763145_20763146delTG** NM_004004.5:c.575_576delCA XM_005266354.1:c.575_576delCA XM_005266355.1:c.575_576delCA XM_005266356.1:c.575_576delCA
Desired Output
13 20763145 20763146 TG -
Thank you :).
Upvotes: 1
Views: 748
Reputation: 58578
TXR Language:
Input Variant@(skip)
@(skip)NC_@{nc-raw}.@(skip)g.@{g-left}_@{g-right}del@{letters 2}@(skip)
@(bind nc-num @(int-str nc-raw))
@(output)
@{nc-num 6} @{g-left 12} @{g-right 12} @{letters 6} -
@(end)
Run:
$ txr nc.txr data
13 20763145 20763146 TG -
All in the command line:
$ txr -c 'Input Variant@(skip)
@(skip)NC_@{nc-raw}.@(skip)g.@{g-left}_@{g-right}del@{letters 2}@(skip)
@(bind nc-num @(int-str nc-raw))
@(output)
@{nc-num 6} @{g-left 12} @{g-right 12} @{letters 6} -
@(end)' data
13 20763145 20763146 TG -
Upvotes: 2