Reputation: 4098
I have a file with a column of numbers
4.685
5.440
5.751
4.685
3.979
In my shell script I would like to interrogate many of these files and get the standard deviation and mean
I can achieve mean using awk easily enough
awk '{sum+=$1} END { print sum/NR}' file
When using awk for standard deviation
awk '{x[NR]=$0; s+=$0} END{a=s/NR; for (i in x){ss += (x[i]-a)^2} sd = sqrt(ss/NR); print sd}' file
I get 0.625
. This number differs from excel which gives me 0.699
. I have since discovered I can execute R from the command line to print out the sd:
R -q -e "x <- read.csv('file', header = F); sd(x[ , 1])"
However, this gives a slightly messy output
[1] 4.908
\>
\>
Can I adjust the R command to print out only the number without resorting to head and cut/awk?
Also what is wrong with my awk code for extracting standard deviation?
Upvotes: 0
Views: 448
Reputation: 2992
I can't quite tell what's wrong with your awk, but for the R
command, you might find that write
helps:
R -q -e "x<- read.csv('file.csv',header=FALSE)[,1] ; write(sd(x),file='result.txt')"
Upvotes: 0