Reputation: 491
I'm having an issue with applying a function with nested for loops, so spit out data values per individual and per month. Currently I can apply the function with a for loop so I get the data per month.
The dataset and function I'm using are very large, but I've created some example data and an example function below.
structure(list(code = c("a", "a", "a", "a", "a", "a", "b", "b",
"b", "b", "b", "b", "c", "c", "c", "c", "c", "c"), datetime = c("16/04/2016",
"17/04/2016", "25/05/2016", "26/05/2016", "01/06/2016", "02/06/2016",
"16/05/2016", "17/05/2016", "25/06/2016", "26/06/2016", "01/07/2016",
"02/07/2016", "16/06/2016", "17/06/2016", "25/07/2016", "26/07/2016",
"01/08/2016", "02/08/2016"), score = c(17L, 16L, 12L, 16L, 14L,
2L, 1L, 10L, 13L, 12L, 0L, 7L, 17L, 8L, 15L, 20L, 0L, 4L), monthyear = c("2016/04",
"2016/04", "2016/05", "2016/05", "2016/06", "2016/06", "2016/05",
"2016/05", "2016/06", "2016/06", "2016/07", "2016/07", "2016/06",
"2016/06", "2016/07", "2016/07", "2016/08", "2016/08")), class = "data.frame", row.names = c(NA,
-18L))
month_list <- strftime(seq(as.Date("2016/04/01"), as.Date("2016/08/31"), by = "month"),format="%Y/%m")
test_func <- function(dat) {
metrics <- dat %>% summarize(
mean = mean(score, na.rm = TRUE),
sd = sd(score, na.rm = TRUE))
metrics$code <- rep(first(dat$code), nrow(metrics))
metrics$monthyear <- rep(first(dat$monthyear), nrow(metrics))
return(metrics)
}
my_datalist = list()
for (i in month_list) {
# define outputs of function
my_datalist[[i]] <- testdat %>%
filter(monthyear== i) %>%
test_func
# add outputs to empty datalist
}
my_metric_data = do.call(rbind, my_datalist)
#turn into dataframe
my_metric_data = do.call(rbind, my_datalist)
This returns a row of data, one for each month in my month list. I need to apply this function (test_func) now to each individual in the dataset per month. So I thought I'd construct a nested for loop, where I filter the data per month, create a list of the individuals (code) for that month. Then apply the function to that list.
my_datalist = list()
for (i in month_list) {
dat <- df %>%
filter(monthyear== i)
code_list <- as.character(unique(dat$code))
for (j in code_list){
my_datalist[[j]] <- dat %>%
filter(code == j) %>%
test_func
}
}
my_metric_data <- do.call(rbind, my_datalist)
However, when I examine the outputs it looks like it's just applying the function to the first code and not return the data, per code, per month. But I'm not sure why it's doing this. I think I need to potentially make another empty to list to populate, then add to the first list but my attempts at this haven't worked so far.
Upvotes: 0
Views: 214
Reputation: 491
A colleague of mine helped me solve this so I thought I'd post the answer.
The easiest way to fix it would probably be to set up an index counter variable before you run the loops;
idx_cnt <- 1
and then within your inner loop ( the j one), use this to index the results list , and then add 1 so that the next result goes into the next slot. The resulting code looks like this.
datalist = list()
idx <- 1
for (i in month_list) {
dat <- dat %>%
filter(monthyear== i)
code_obs <- dat %>%
group_by(code) %>%
summarise(n = n()) %>%
filter(n >=20) %>%
ungroup()
code_list <- as.character(unique(code_obs$code))
for (j in code_list){
datalist [[idx]] <- dat %>%
filter(code == j) %>%
nodeMetrics_func
idx <- idx + 1
}
}
Upvotes: 0
Reputation: 389325
We don't have any data to run or verify the solution but can you try this split
+ lapply
approach.
result <- do.call(rbind, lapply(split(GRS_filt,
list(GRS_filt$monthyear, GRS_filt$code)), net_func))
Upvotes: 0