Reputation: 3805
Some sample data
I have three lists
loc <- c("A","A","A","B","B","B")
sub.loc <- c(1,2,3,1,2,3)
set.seed(123)
df1 <- as.data.frame(cbind(loc,sub.loc, round(rnorm(6),digits =2)))
df2 <- as.data.frame(cbind(loc,sub.loc, round(rnorm(6),digits =2)))
df3 <- as.data.frame(cbind(loc,sub.loc, round(rnorm(6),digits =2)))
list.name <- list(df1,df2,df3)
I want to produce a single file that has mean and sd of the third column V3
.
Something like:
loc sub.loc V3 v4
A 1 mean(c(-0.56,0.46,0.4)) sd(c(-0.56,0.46,0.4))
A 2 mean(c(-0.23,-1.27,0.11)) sd(c(-0.23,-1.27,0.11))
A 3 mean(c(-0.56,-0.69, 1.56)) sd(c(-0.56,-0.69, 1.56))
B 1 mean(c(0.07,-0.45,1.79)) sd(c(0.07,-0.45,1.79))
B 2 mean(c(0.13,1.22,0.5)) sd(c(0.13,1.22,0.5))
B 3 mean(c(1.72,0.36,-1.97)) sd(c(1.72,0.36,-1.97))
My actual data in column `V3`` has NAs
I thought of using lapply
lapply(list.name, function(x) mean(x, na.rm = T))
lapply(list.name, function(x) sd(x, na.rm = T))
But both of them give me NAs
Upvotes: 1
Views: 1396
Reputation: 107567
Currently, you are running mean()
and sd()
across entire dataframe which contains non-numeric columns and hence NA
.
Consider base R's by
(object-oriented wrapper of tapply
) where you first stack list of dataframes with rbind
and then run aggregations across loc and sub.loc groupings:
stackdf <- do.call(rbind, list.name)
stackdf
# loc sub.loc V3
# 1 A 1 -0.56
# 2 A 2 -0.23
# 3 A 3 1.56
# 4 B 1 0.07
# 5 B 2 0.13
# 6 B 3 1.72
# 7 A 1 0.46
# 8 A 2 -1.27
# 9 A 3 -0.69
# 10 B 1 -0.45
# 11 B 2 1.22
# 12 B 3 0.36
# 13 A 1 0.40
# 14 A 2 0.11
# 15 A 3 -0.56
# 16 B 1 1.79
# 17 B 2 0.50
# 18 B 3 -1.97
dfs <- by(stackdf, stackdf[c("loc", "sub.loc")], FUN=function(df) {
data.frame(loc = df$loc[1],
sub.loc = df$sub.loc[1],
mean = mean(df$V3, na.rm=TRUE),
sd = sd(df$V3, na.rm=TRUE))
})
finaldf <- do.call(rbind, dfs)
finaldf
# loc sub.loc mean sd
# 1 A 1 0.10000000 0.5723635
# 2 B 1 0.47000000 1.1723481
# 3 A 2 -0.46333333 0.7189808
# 4 B 2 0.61666667 0.5542863
# 5 A 3 0.10333333 1.2631838
# 6 B 3 0.03666667 1.8661279
Upvotes: 0
Reputation: 206167
This can be done with dplyr
. First, I'm not sure how accurate your sample data above matches your real data but right now all your "numeric" values are factors. You really shouldn't use cbind()
inside as.data.frame()
, you can leave it out.
But with your example data above, we can stack the data into one larger data.frame and then do a simple group_by to get the values you want
library(dplyr)
bind_rows(list.name, .id="from") %>%
mutate(V3=as.numeric(as.character(V3))) %>% # fix the factors from the sample
group_by(loc, sub.loc) %>%
summarize(mean=mean(V3, na.rm=T), sd=sd(V3, na.rm=T))
Upvotes: 1