MorrisseyJ
MorrisseyJ

Reputation: 1271

Apply function over list then iterate over second variable, in r

I am trying to have a function apply over a list and iterate over a second variable in the function, in r.

Here is an example:

Create the data

A <- data.frame(var = 1:3, year = 2000:2002)
B <- data.frame(var = 4:6, year = 2000:2002)
C <- data.frame(var = 7:9, year = 2000:2002)

ABC <- list(A, B, C)

> ABC
[[1]]
  var year
1   1 2000
2   2 2001
3   3 2002

[[2]]
  var year
1   4 2000
2   5 2001
3   6 2002

[[3]]
  var year
1   7 2000
2   8 2001
3   9 2002

Write the function: sum (which simply filters for a start year and sums the 'var' values - sorry this simple function got messier in this example than I had intended).

library(dplyr) 

sum <- function(dat, start.year) {
  dat %>%
    filter(year >= start.year) %>%
    select(var) %>%
    colSums() %>%
    data.frame(row.names = NULL) %>%
    rename(var = '.') %>%
    mutate(start = start.year)
}

Now I can apply the function to the list (and bind_rows to get a neat output):

lapply(ABC, sum, 2000) %>%
  bind_rows()
  var start
1   6  2000
2  15  2000
3  24  2000

What I want to do however is iterate over start.year creating dataframes for start.year = c(2000, 2001, 2002). This would ideally give:

  var start
1   6  2000
2  15  2000
3  24  2000
4   5  2001
5  11  2001
6  17  2001
7   3  2002
8   6  2002
9   9  2002

I have looked at map2, but that talks about using vectors of the same length. That would work in this case, but imagine my list had 4 items in it and only 3 records per list. So assume map2 is doing something different. I also thought about a nested for loop. When I started writing that however I realized I would be dealing with list.append functions in r and that seemed wrong. I assume this is an easy thing to do. Any help would be appreciated.

Upvotes: 2

Views: 1238

Answers (1)

akrun
akrun

Reputation: 887118

We can do this with a nested lapply/map

library(purrr)
map_dfr(2000:2002, ~ map_dfr(ABC, sum, .x))
#   var start
#1   6  2000
#2  15  2000
#3  24  2000
#4   5  2001
#5  11  2001
#6  17  2001
#7   3  2002
#8   6  2002
#9   9  2002

Or inspired from @thelatemail's suggestion with Map

map2_dfr(rep(ABC, 3),  rep(2000:2002,each=length(ABC)), sum)

With lapply

do.call(rbind, lapply(2000:2002, function(x) do.call(rbind, lapply(ABC, sum, x))))
#   var start
#1   6  2000
#2  15  2000
#3  24  2000
#4   5  2001
#5  11  2001
#6  17  2001
#7   3  2002
#8   6  2002
#9   9  2002

Or as @thelatemail mentioned

do.call(rbind, Map(sum, ABC, start.year=rep(2000:2002,each=length(ABC))))

If the OP's function can be changed, another option is

library(dplyr)
library(tidyr)
map_dfr(ABC, ~ .x %>% 
                   crossing(year2 = 2000:2002) %>% 
                   filter(year >= year2) %>%
                   group_by(year2) %>% 
                   summarise(var = base::sum(var)))

Or instead of doing this in a list, we can bind them together with bind_rows then do a group by sum after crossing with the input 'years'

bind_rows(ABC, .id = 'grp') %>%
     group_by(grp) %>% 
     crossing(year2 = 2000:2002) %>% 
     filter(year >= year2) %>% 
     group_by(grp, year2) %>%
     summarise(var = base::sum(var))

Upvotes: 2

Related Questions