Reputation: 784
I have written a code to calculate the zscore which calculates the mean and standard deviation from one file and uses some values from rows in another file, as follows:
mean=$(awk '{total += $2; count++} END {print total/count}' ABC_avg.txt)
#calculating mean of the second column of the file
std=$(awk '{x[NR]=$2; s+=$2; n++} END{a=s/n; for (i in x){ss += (x[i]-a)^2} sd = sqrt(ss/n); print sd}' ABC_avg.txt)
#calculating standard deviation from the second column of the same file
awk '{if (std) print $2-$mean/$std}' ABC_splicedavg.txt" > ABC.tmp
#calculate the zscore for each row and store it in a temporary file
zscore=$(awk '{total += $0; count++} END {if (count) print total/count}' ABC.tmp)
#calculate an average of all the zscores in the rows and store it in a variable
echo $motif" "$zscore
rm ABC.tmp
However when I execute this code ,at the step where a temp file is created I get an error as fatal: division by zero attempted, what is the right way to implement this code? TIA I used bc -l option but it gives a very long version of the floating integer.
Upvotes: 0
Views: 5262
Reputation: 67467
Here is a script to compute mean and std in one pass, you may lose some resolution if not acceptable there are alternatives...
$ awk '{print rand()}' <(seq 100)
| awk '{sum+=$1; sqsum+=$1^2}
END{print mean=sum/NR, std=sqrt(sqsum/NR-mean^2), z=mean/std}'
0.486904 0.321789 1.51312
Your script for z-score for each sample is wrong! You need to do ($2-mean)/std.
Upvotes: 2
Reputation: 246754
You can control the precision of your output with bc by using the scale
variable:
$ echo "4/7" | bc -l
.57142857142857142857
$ echo "scale=3; 4/7" | bc -l
.571
Upvotes: 1