intern
intern

Reputation: 325

Using nested for loops to create a data frame in R

I was looking to find a way to use two for loops to create a specific data frame in R. I got started on a function but was having some difficulty with it. The first for loop would loop through the names of a list of data frames and the second for loop would loop through the columns of each data frame and give back the mean. The output would then give back a data frame with each row containing the means of the columns for one of the data frames. Here's some dummy data:

first<- data.frame(b = factor(c("Hi", "Hi","Hi","Hi")), y = c(8, 3, 9, 9),
               z = c(1, 1, 1, 2))
second<- data.frame(b = factor(c("Med", "Med", "Med", "Med")),y = c(3, 2, 6, 5),
                z = c(1, 11, 4, 3))

third<- list(first,second)
fourth<- c("first","second")
names(third)<- c(fourth)
fifth<- c("y","z")

Here's the function I was working on:

testr<- function(arg1,arg2){
  a<- list()
  for(i in 1:length(arg1)){
   b<- (third[[arg1[i]]])
    for(j in 1:length(arg2)){
      c<- mean(b[[arg2[[j]]]])
      a[[j]]<-c
    }
  }
  df<- do.call("cbind",a)
  df<-as.data.frame(df)
  df$name<- arg1
  return(df)
}

My goal would be this result:

testr(fourth,fifth)

    V1   V2  name
1 7.25 1.25 first
2    4 4.75 second

But instead I get this:

testr(fourth,fifth)

 Error in `$<-.data.frame`(`*tmp*`, "name", value = c("first", "second" : 
  replacement has 2 rows, data has 1 

Any help would be greatly appreciated!

Upvotes: 2

Views: 2305

Answers (2)

Brandon Bertelsen
Brandon Bertelsen

Reputation: 44698

My advice... let's just avoid for loops all together. It looks like you just want the mean of the two columns and the name of the data.frame.

Pick up some skills with dplyr or data.table that make this type of summarization trivial.

library(dplyr)
third %>% 
  bind_rows(.id = "name") %>% 
  group_by(name) %>% 
  summarize(
    V1 = mean(y), 
    V2 = mean(z))

# Source: local data frame [2 x 3]
#
#     name    V1    V2
#    (chr) (dbl) (dbl)
# 1  first  7.25  1.25
# 2 second  4.00  4.75


library(data.table)
dt <- rbindlist(third)
dt[,list(V1 = mean(y),V2 = mean(z)),by = b]
#      b   V1   V2
# 1:  Hi 7.25 1.25
# 2: Med 4.00 4.75

# or as David points out.
dt[, lapply(.SD, mean), by = b]
#      b    y    z
# 1:  Hi 7.25 1.25
# 2: Med 4.00 4.75

Upvotes: 1

Gopala
Gopala

Reputation: 10483

Assuming you have many such data frames as first and second and a list of such data frames as follows, you can use dplyr to get the desired result as follows:

library(dplyr)
l <- list(first, second)
df <- do.call(rbind, l)
df %>% group_by(b) %>% summarise_each(funs(mean))

Output is:

Source: local data frame [2 x 3]

       b     y     z
  (fctr) (dbl) (dbl)
1     Hi  7.25  1.25
2    Med  4.00  4.75

Upvotes: 1

Related Questions