Reputation: 3022
Thanks to @karakfa the below awk
array results in the output. I am trying to add $2
to the array and output that as well. $2
is basically the amount of times the unique entry appears. As I am leaaring awk
arrays I do not know if my attempt is close.
Input:
chr1:955542-955763 AGRN:exon.1 1 0
chr1:955542-955763 AGRN:exon.1 2 0
chr1:985542-985763 AGRN:exon.2 1 0
chr1:985542-985763 AGRN:exon.2 2 1
My script:
awk '{k=$1 OFS $2;
l=$2; # Is this correct?
s[k]+=$4; c[k]++}
END{for(i in s) # Is this correct?
print i, s[i]/c[i]},
"(lbases)" # Is this correct?' input
Current output:
chr1:955542-955763 AGRN:exon.1 0
chr1:985542-985763 AGRN:exon.2 0.5
Desired output:
chr1:955542-955763 AGRN:exon.1 0 (2 bases)
chr1:985542-985763 AGRN:exon.2 0.5 (2 bases)
Upvotes: 3
Views: 228
Reputation: 189317
Your attempt to introduce a new variable is not going to work. You need a count per array key, so the variable should be another array. But in this case, you don't need to add a new array, because the array c
already contains the count per key.
awk '{k=$1 OFS $2;
s[k]+=$4; c[k]++}
END{for(i in s)
print i, s[i]/c[i], c[i] " bases" }' input
Notice also how your attempt unhappily had the "bases" outside the closing brace of the END
block.
This differs from the problem description in that the key is not $2
, but the combination of $1
and $2
. If you genuinely need the key to be solely $2
, you do need a new array, but then the whole thing will get quite a bit more complex.
Upvotes: 4