justaguy
justaguy

Reputation: 3022

awk not changing numbering in output file

The below awk is supposed filter $8 of the tab-delimited input using each line in gene. Then number each line skipping the header, sequentially. I think it is filtering the input but not numbering correctly. The desired output is just the filtered input but with $1 or R_Index sequentially numbered. Thank you :).

input

R_Index Chr Start   End Ref Alt Func.IDP.refGene    Gene.IDP.refGene
11  chr1    1147422 1147422 C   T   exonic  TNFRSF4
12  chr1    1168180 1168180 G   C   exonic  B3GALT6

contents of gene

TNFRSF4
B3GALT6

current output ---- header row---

R_Index Chr Start   End Ref Alt Func.IDP.refGene    Gene.IDP.refGene ---
11  chr1    1147422 1147422 C   T   exonic  TNFRSF4
12  chr1    1168180 1168180 G   C   exonic  B3GALT6

desired output

R_Index Chr Start   End Ref Alt Func.IDP.refGene    Gene.IDP.refGene
1   chr1    1147422 1147422 C   T   exonic  TNFRSF4
2   chr1    1168180 1168180 G   C   exonic  B3GALT6

awk

awk 'NR==FNR{for (i=1;i<=NF;i++) a[$i];next} FNR==1 || ($8 in a)' gene input | awk '{split($2,a,"-"); print a[1] "\t" $0}' | cut -f2-> output

Upvotes: 0

Views: 40

Answers (1)

Ed Morton
Ed Morton

Reputation: 203995

Your question isn't clear but this MIGHT be what you want:

awk 'NR==FNR{a[$0];next} FNR==1{print} $8 in a{$1=++c; print}' gene input 

Upvotes: 2

Related Questions