Reputation: 3022
The below awk
is supposed filter $8
of the tab-delimited input
using each line in gene
. Then number each line skipping the header, sequentially. I think it is filtering the input
but not numbering correctly. The desired output is just the filtered input
but with $1
or R_Index
sequentially numbered. Thank you :).
input
R_Index Chr Start End Ref Alt Func.IDP.refGene Gene.IDP.refGene
11 chr1 1147422 1147422 C T exonic TNFRSF4
12 chr1 1168180 1168180 G C exonic B3GALT6
contents of gene
TNFRSF4
B3GALT6
current output ---- header row---
R_Index Chr Start End Ref Alt Func.IDP.refGene Gene.IDP.refGene ---
11 chr1 1147422 1147422 C T exonic TNFRSF4
12 chr1 1168180 1168180 G C exonic B3GALT6
desired output
R_Index Chr Start End Ref Alt Func.IDP.refGene Gene.IDP.refGene
1 chr1 1147422 1147422 C T exonic TNFRSF4
2 chr1 1168180 1168180 G C exonic B3GALT6
awk
awk 'NR==FNR{for (i=1;i<=NF;i++) a[$i];next} FNR==1 || ($8 in a)' gene input | awk '{split($2,a,"-"); print a[1] "\t" $0}' | cut -f2-> output
Upvotes: 0
Views: 40
Reputation: 203995
Your question isn't clear but this MIGHT be what you want:
awk 'NR==FNR{a[$0];next} FNR==1{print} $8 in a{$1=++c; print}' gene input
Upvotes: 2