Reputation: 149
My data frame looks like this in R (but much bigger):
x<-c(1,2,3,4,5,6)
y<-c(2,5,3,4,9,63)
run<-c(1,1,2,2,1,1)
studie<-c("stu1","stu1","stu1","stu1","stu2","stu2")
df<-data.frame(x,y,run,studie)
I want to calculate the standard deviation for each column(in this case just x and y) for each run on each studie, sd for each studie and finally a sd for the whole column. A bit confusing explanation but in this case it would be a sd for (on x):
(1,2) since they both are in studie 1 and on run 1,
(3,4) since they both are in studie 1 and on run 2,
(5,6) since they both are in studie 2 and on run 1,
(1,2,3,4) since they are in studie 1
(5,6) since they are in studie 2
(1,2,3,4,5,6) since they are in column 1.
I think I should use apply function but cant figure out how it works.
Upvotes: 0
Views: 97
Reputation: 887721
We can use data.table
library(data.table)
setDT(df)[, .(Sd= sd(x)) , by = .(studie, run)]
and for both columns, use lapply
after specifying the .SDcols
as 'x' and 'y'.
setDT(df)[, lapply(.SD, sd), by = .(studie, run), .SDcols = x:y]
Upvotes: 1
Reputation: 35324
In base R, you can use aggregate()
and then sapply()
:
aggregate(cbind(x,y)~run+studie,df,sd);
## run studie x y
## 1 1 stu1 0.7071068 2.1213203
## 2 2 stu1 0.7071068 0.7071068
## 3 1 stu2 0.7071068 38.1837662
aggregate(cbind(x,y)~studie,df,sd);
## studie x y
## 1 stu1 1.2909944 1.290994
## 2 stu2 0.7071068 38.183766
sapply(df[c('x','y')],sd);
## x y
## 1.870829 23.963862
Also, just in case you want to parameterize the target columns (requires using the non-formula interface of aggregate()
):
vars <- c('x','y');
aggregate(df[vars],df[c('run','studie')],sd);
## run studie x y
## 1 1 stu1 0.7071068 2.1213203
## 2 2 stu1 0.7071068 0.7071068
## 3 1 stu2 0.7071068 38.1837662
aggregate(df[vars],df['studie'],sd);
## studie x y
## 1 stu1 1.2909944 1.290994
## 2 stu2 0.7071068 38.183766
sapply(df[vars],sd);
## x y
## 1.870829 23.963862
Upvotes: 3
Reputation: 1942
When grouping with respect to studie and run
library(dplyr)
df %>% group_by(studie,run) %>% summarise(Sd= sd(x))
When grouping with respect to studie
df %>% group_by(studie) %>% summarise(Sd= sd(x))
For all the column
sd(df$x)
Upvotes: 0