Reputation: 325
I was looking to find a way to use two for loops to create a specific data frame in R. I got started on a function but was having some difficulty with it. The first for loop would loop through the names of a list of data frames and the second for loop would loop through the columns of each data frame and give back the mean. The output would then give back a data frame with each row containing the means of the columns for one of the data frames. Here's some dummy data:
first<- data.frame(b = factor(c("Hi", "Hi","Hi","Hi")), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
second<- data.frame(b = factor(c("Med", "Med", "Med", "Med")),y = c(3, 2, 6, 5),
z = c(1, 11, 4, 3))
third<- list(first,second)
fourth<- c("first","second")
names(third)<- c(fourth)
fifth<- c("y","z")
Here's the function I was working on:
testr<- function(arg1,arg2){
a<- list()
for(i in 1:length(arg1)){
b<- (third[[arg1[i]]])
for(j in 1:length(arg2)){
c<- mean(b[[arg2[[j]]]])
a[[j]]<-c
}
}
df<- do.call("cbind",a)
df<-as.data.frame(df)
df$name<- arg1
return(df)
}
My goal would be this result:
testr(fourth,fifth)
V1 V2 name
1 7.25 1.25 first
2 4 4.75 second
But instead I get this:
testr(fourth,fifth)
Error in `$<-.data.frame`(`*tmp*`, "name", value = c("first", "second" :
replacement has 2 rows, data has 1
Any help would be greatly appreciated!
Upvotes: 2
Views: 2305
Reputation: 44698
My advice... let's just avoid for loops all together. It looks like you just want the mean of the two columns and the name of the data.frame.
Pick up some skills with dplyr
or data.table
that make this type of summarization trivial.
library(dplyr)
third %>%
bind_rows(.id = "name") %>%
group_by(name) %>%
summarize(
V1 = mean(y),
V2 = mean(z))
# Source: local data frame [2 x 3]
#
# name V1 V2
# (chr) (dbl) (dbl)
# 1 first 7.25 1.25
# 2 second 4.00 4.75
library(data.table)
dt <- rbindlist(third)
dt[,list(V1 = mean(y),V2 = mean(z)),by = b]
# b V1 V2
# 1: Hi 7.25 1.25
# 2: Med 4.00 4.75
# or as David points out.
dt[, lapply(.SD, mean), by = b]
# b y z
# 1: Hi 7.25 1.25
# 2: Med 4.00 4.75
Upvotes: 1
Reputation: 10483
Assuming you have many such data frames as first
and second
and a list of such data frames as follows, you can use dplyr
to get the desired result as follows:
library(dplyr)
l <- list(first, second)
df <- do.call(rbind, l)
df %>% group_by(b) %>% summarise_each(funs(mean))
Output is:
Source: local data frame [2 x 3]
b y z
(fctr) (dbl) (dbl)
1 Hi 7.25 1.25
2 Med 4.00 4.75
Upvotes: 1