PrincessJellyfish
PrincessJellyfish

Reputation: 149

Standard deviation depending on variable

My data frame looks like this in R (but much bigger):

x<-c(1,2,3,4,5,6)
y<-c(2,5,3,4,9,63)
run<-c(1,1,2,2,1,1)
studie<-c("stu1","stu1","stu1","stu1","stu2","stu2")
df<-data.frame(x,y,run,studie)

I want to calculate the standard deviation for each column(in this case just x and y) for each run on each studie, sd for each studie and finally a sd for the whole column. A bit confusing explanation but in this case it would be a sd for (on x):

(1,2) since they both are in studie 1 and on run 1,
(3,4) since they both are in studie 1 and on run 2,
(5,6) since they both are in studie 2 and on run 1,
(1,2,3,4) since they are in studie 1 
(5,6) since they are in studie 2
(1,2,3,4,5,6) since they are in column 1.

I think I should use apply function but cant figure out how it works.

Upvotes: 0

Views: 97

Answers (3)

akrun
akrun

Reputation: 887721

We can use data.table

library(data.table)
setDT(df)[, .(Sd= sd(x)) , by = .(studie, run)]

and for both columns, use lapply after specifying the .SDcols as 'x' and 'y'.

setDT(df)[, lapply(.SD, sd), by = .(studie, run), .SDcols = x:y]

Upvotes: 1

bgoldst
bgoldst

Reputation: 35324

In base R, you can use aggregate() and then sapply():

aggregate(cbind(x,y)~run+studie,df,sd);
##   run studie         x          y
## 1   1   stu1 0.7071068  2.1213203
## 2   2   stu1 0.7071068  0.7071068
## 3   1   stu2 0.7071068 38.1837662
aggregate(cbind(x,y)~studie,df,sd);
##   studie         x         y
## 1   stu1 1.2909944  1.290994
## 2   stu2 0.7071068 38.183766
sapply(df[c('x','y')],sd);
##         x         y
##  1.870829 23.963862

Also, just in case you want to parameterize the target columns (requires using the non-formula interface of aggregate()):

vars <- c('x','y');
aggregate(df[vars],df[c('run','studie')],sd);
##   run studie         x          y
## 1   1   stu1 0.7071068  2.1213203
## 2   2   stu1 0.7071068  0.7071068
## 3   1   stu2 0.7071068 38.1837662
aggregate(df[vars],df['studie'],sd);
##   studie         x         y
## 1   stu1 1.2909944  1.290994
## 2   stu2 0.7071068 38.183766
sapply(df[vars],sd);
##         x         y
##  1.870829 23.963862

Upvotes: 3

adaien
adaien

Reputation: 1942

When grouping with respect to studie and run

 library(dplyr)
 df %>% group_by(studie,run) %>% summarise(Sd= sd(x))

When grouping with respect to studie

df %>% group_by(studie) %>% summarise(Sd= sd(x))

For all the column

sd(df$x)

Upvotes: 0

Related Questions