Reputation: 89
I have a very large data frame of the following format
uniqueID | year | header_1 | header_2 | c | d | etc. |
---|---|---|---|---|---|---|
0001 | 1990 | x | TRUE | |||
0002 | 1990 | y | FALSE | other data | ||
0003 | 1995 | x | FALSE |
I can filter, summarise, and rearrange it like this:
new_df <- filter(df, year %in% c(1990))
count_new_df <- group_by(new_df, header_1, header_2) %>%
summarise(count = n())
count_wide <- count_new_df %>% pivot_wider(names_from = header_1, values_from = count)
If I run this as explicit code it works perfectly. However, if I try to write a function where d = the starting df, y = the year of data I want to see, and I insert variables a, b for the column headers, it breaks
slice <- function (d,y,a,b) {
t <- filter(d, year %in% c(y))
c <- group_by(t, a, b) %>%
summarise(count = n())
c2 <- c %>% pivot_wider(names_from = a, values_from = count)
}
with the error message: must group by variables found in ' .data', column 'a' is not found, column 'b' is not found.
If I change to calling d$a and d$b I get object 'a' not found. I also tried group_by(t, t$a, t$b) and that didn't work either. What am I missing? There must be some way to call the columns of a df created inside a function.
TIA
Upvotes: 0
Views: 101
Reputation: 389275
You can use {{}}
to refer to columns inside the function :
library(tidyverse)
new_slice <- function (d,y,a,b) {
t <- filter(d, year %in% y)
c <- group_by(t, {{a}}, {{b}}) %>% summarise(count = n())
#Can also use count
#c <- count(t, {{a}}, {{b}}, name = 'count')
c2 <- c %>% pivot_wider(names_from = {{a}}, values_from = count)
c2
}
new_slice(d, 1990, header_1, header_2)
Upvotes: 2