Reputation: 2097
library(tidyverse)
library(purrr)
Using the sample data below, I can create the following function:
Funs <- function(DF, One, Two){
One <- enquo(One)
Two <- enquo(Two)
DF %>% filter(School == (!!One) & Code == (!!Two)) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1))
}
I can then use the function to filter on two variables - School and Code - like this:
Funs(DF, "School1", "B344")
That's all good, but my actual data has many variables and therefore instead of having to constantly type the "School" and "Code" variables into the function, I would like to use tidyverse and the purrr package to loop over two lists (one of School, and one for Code) and feed this into filter. I would like the output to be a list of results.
For the sake of simplicity, the two lists to feed into dplyr::filter will only have two values each: School2 will go with S300, and School1 with B344, just like the example above.
Some examples that I've tried:
map2(c(“School2”, ”School1”),
c(“S300”, ”B344”),
function(x,y) {
DF %>% filter(School == .x & Code == .y) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1))
}
Also...
map2(c("School2", "School1")),
c("S300","B344"),
~filter(School == .x & Code == .y) %>%
group_by(Code, School)%>%
summarise(Count = sum(Question1))
And this...
list(c("School2", "School1"), c("S300", "B344")) %>%
map2( ~ filter(School == .x & Code == .y) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1)))
None of these seem to work, so help would be appreciated!
Sample data:
Code <- c("B344","B555","S300","T220","B888","B888","B555","B344","B344","T220","B555","B555","S300","B555","S300","S300","S300","S300","B344","B344","B888","B888","B888")
School <- c("School1","School1","School2","School3","School4","School4","School1","School1","School3","School3","School4","School1","School1","School3","School2","School2","School4","School2","School3","School4","School3","School1","School2")
Question1 <- c(3,4,5,4,5,5,5,4,5,3,4,5,4,5,4,3,3,3,4,5,4,3,3)
Question2 <- c(5,4,3,4,3,5,4,3,2,3,4,5,4,5,4,3,4,4,5,4,3,3,4)
DF <- data_frame(Code, School, Question1, Question2)
Upvotes: 1
Views: 1431
Reputation: 43334
Here's some options, from most like your code to most optimal:
library(tidyverse)
DF <- data_frame(Code = c("B344", "B555", "S300", "T220", "B888", "B888", "B555", "B344", "B344", "T220", "B555", "B555", "S300", "B555", "S300", "S300", "S300", "S300", "B344", "B344", "B888", "B888", "B888"),
School = c("School1", "School1", "School2", "School3", "School4", "School4", "School1", "School1", "School3", "School3", "School4", "School1", "School1", "School3", "School2", "School2", "School4", "School2", "School3", "School4", "School3", "School1", "School2"),
Question1 = c(3, 4, 5, 4, 5, 5, 5, 4, 5, 3, 4, 5, 4, 5, 4, 3, 3, 3, 4, 5, 4, 3, 3),
Question2 = c(5, 4, 3, 4, 3, 5, 4, 3, 2, 3, 4, 5, 4, 5, 4, 3, 4, 4, 5, 4, 3, 3, 4))
wanted <- data_frame(School = c("School2", "School1"),
Code = c("S300", "B344"))
To get map2
to work, if using tilde notation, the variables are named .x
and .y
; if you use regular function notation, you can call them whatever you like. Don't forget that the first parameter of filter
is the data frame piped in, so:
map2_dfr(wanted$School, wanted$Code, ~filter(DF, School == .x, Code == .y)) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
Since I set up wanted
as a data frame (a vanilla list would work too), you can use pmap
instead. Parameter names with pmap
can actually be the same as map2
for two variables, but it's really a function with ...
for its parameters, so it often makes sense to handle them differently, e.g. with ..1
notation:
wanted %>%
pmap_dfr(~filter(DF, School == ..1, Code == ..2)) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
The problem with both of the above techniques is that at scale, they will be slow, because they are running filter
for every row of wanted
, meaning you're retesting each row many times. To keep the code similar, a slightly hacky way to avoid the extra work is to combine the columns into one, e.g. with tidyr::unite
:
DF %>%
unite(school_code, School, Code) %>%
filter(school_code %in% invoke(paste, wanted, sep = '_')) %>% # or paste(wanted$School, wanted$Code, sep = '_') or equivalent
separate(school_code, c('School', 'Code')) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
...or just combine them within filter
itself:
DF %>%
filter(paste(School, Code) %in% paste(wanted$School, wanted$Code)) %>% # or invoke(paste, wanted)
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
The best way to get your desired result is perhaps more obvious now that I've set up wanted
as a data frame: a join, which is designed to do precisely this job:
DF %>%
inner_join(wanted) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> Joining, by = c("Code", "School")
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
Upvotes: 1