Reputation: 2510
I am trying to build nested for loops in R. The inner loop works fine, but rather than having to edit the input files each time, I wanted to bundle them in a list and get a second loop to work through them sequentially. However, I can't get my head round how to extract the output from the outer loop. At the moment, I can just use a single loop and assign the output file to another object afterwards. It works fine, but is inefficient. Can anyone give me pointers where I'm going wrong, please?
Here is some made up data. For brevity's sake this is much smaller than my real data (Seven input files, of differing lengths). The full version has many more columns of both identifiers and values (so a mix of data types).
# Make some data
DATA.1 <- data.frame("Type" = "Oranges", "Time" = "Day", "Group" =
sample(rep(1:24, each = 6), replace = FALSE), "Val.1" = rnorm(144, mean = 0.5,
sd = 1), "Val.2" = rnorm(144, mean = 100, sd = 30), "Val.3" = rnorm(144, mean = 2,
sd = 1))
DATA.2 <- data.frame("Type" = "Oranges", "Time" = "Day", "Group" = sample(rep(1:72,
each = 6), replace = FALSE), "Val.1" = rnorm(432, mean = 0.5, sd = 1) , "Val.2" =
rnorm(432, mean = 100, sd = 30), "Val.3" = rnorm(432, mean = 2, sd = 1) )
# Calculate means and standard deviations of data. (Will be output file during loop)
DATA.1out <- DATA.1 %>% group_by(Group) %>% summarise_at(.vars = 3:5, funs(mean, sd))
DATA.2out <- DATA.2 %>% group_by(Group) %>% summarise_at(.vars = 3:5, funs(mean, sd))
# Bind empty columns to populate with standard errors during the loop
DATA.1out <- cbind(DATA.1out, "Val.1_se" = NA, "Val.2_se" = NA, "Val.3_se" = NA)
DATA.2out <- cbind(DATA.2out, "Val.1_se" = NA, "Val.2_se" = NA, "Val.3_se" = NA)
# Loop input
DATA.in <- list(DATA.1, DATA.2)
# Loop output
DATA.out <- list(DATA.1out, DATA.2out)
# This loop calculates the cumulative standard error for each group. i.e. the mean
# and standard deviation apply to that group only, but the standard error is
# comprised of all the values up to and including the most recent group.
for (i in 1:2) {
RAW.FILE <- DATA.in[[i]]
OUTPUT.FILE <- DATA.out[[i]]
COUNTER <- 1
for(i in 1:nrow(OUTPUT.FILE)) {
GROUP.NO <- data.frame(Group = c(1:COUNTER))
TEMP <- RAW.FILE[RAW.FILE$Group %in% GROUP.NO$Group, ]
TEMP$Val.1_se <- sd(TEMP$Val.1)/sqrt(nrow(TEMP))
OUTPUT.FILE$Val.1_se[i] <- unique(TEMP$Val.1_se)
TEMP$Val.2_se <- sd(TEMP$Val.2)/sqrt(nrow(TEMP))
OUTPUT.FILE$Val.2_se[i] <- unique(TEMP$Val.2_se)
TEMP$Val.3_se <- sd(TEMP$Val.3)/sqrt(nrow(TEMP))
OUTPUT.FILE$Val.3_se[i] <- unique(TEMP$Val.3_se)
COUNTER <- COUNTER + 1
}
DATA.out[[i]] <- OUTPUT.FILE
}
Probably not the most efficient method of doing this, but at least the inner loop works. However, I can't get the output to match up to the relevant DATA.out
file. At the moment I end up with a list with many blank dataframes and the relevant out
file in the slot equal to the number of rows of OUTPUT.FILE
. How can I get the standard errors to append to the existing DATA.out
dataframe?
Upvotes: 0
Views: 804
Reputation: 974
For what it's worth, here is a tidyverse
(purrr
) solution:
library(purrr)
DATA.out <- list(DATA.1, DATA.2) %>%
map(function(dat){
out1 <- dat %>%
group_by(Group) %>%
summarise_at(.vars = 3:5, funs(mean, sd)) %>%
arrange(Group)
out2 <- out1$Group %>%
map_df(~ dat %>%
filter(Group %in% c(1:.x)) %>%
select(Val.1_se=Val.1, Val.2_se=Val.2, Val.3_se=Val.3) %>%
summarise_all(~sd(.x)/sqrt(length(.x))))
cbind(out1, out2)
})
This would replace the for
loop as well as the creation of the DATA.1out
and DATA.2out
variables.
Upvotes: 0
Reputation: 3369
This slight modification to your for loops should take care of the issue. When dealing with nested for loops you should give the iterators different variable names, so that you can reference the outer loop variable from within the inner loop. In this case, by changing the inner loop iterator from i to j, you can then move the OUTPUT.FILE assignment inside of the inner loop to get the results you are after.
for (i in 1:2) {
RAW.FILE <- DATA.in[[i]]
OUTPUT.FILE <- DATA.out[[i]]
COUNTER <- 1
for(j in 1:nrow(OUTPUT.FILE)) {
GROUP.NO <- data.frame(Group = c(1:COUNTER))
TEMP <- RAW.FILE[RAW.FILE$Group %in% GROUP.NO$Group, ]
TEMP$Val.1_se <- sd(TEMP$Val.1)/sqrt(nrow(TEMP))
OUTPUT.FILE$Val.1_se[j] <- unique(TEMP$Val.1_se)
TEMP$Val.2_se <- sd(TEMP$Val.2)/sqrt(nrow(TEMP))
OUTPUT.FILE$Val.2_se[j] <- unique(TEMP$Val.2_se)
TEMP$Val.3_se <- sd(TEMP$Val.3)/sqrt(nrow(TEMP))
OUTPUT.FILE$Val.3_se[j] <- unique(TEMP$Val.3_se)
COUNTER <- COUNTER + 1
DATA.out[[i]] <- OUTPUT.FILE
}
}
Upvotes: 1