Reputation: 16842
I'm writing functions to automate a workflow for analyzing a lot of demographic data. I can get what I need from a regular pipe-stream of dplyr
functions, but I need to abstract this into NSE functions. I'm supplying a column name to a series of gather
calls via a ...
argument, but this only works with a single column; I need the option of using multiple columns. I'm having trouble with how to use quos(...)
in this case.
There's more to the function, but I'm including just enough to show the error.
Sample of data:
library(tidyverse)
race_pops <- structure(list(
town = c("Hamden", "Hamden", "Hamden", "Hamden","New Haven", "New Haven", "New Haven", "New Haven", "West Haven","West Haven", "West Haven", "West Haven"),
race = c("Total","White", "Black", "Latino", "Total", "White", "Black", "Latino","Total", "White", "Black", "Latino"),
est = c(61476, 37043, 13209,6450, 130405, 40164, 42970, 37231, 54972, 28864, 10677, 10977),
moe = c(31, 1039, 998, 879, 60, 1395, 1383, 1688, 42, 1226,1119, 1032),
region = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,2L, 1L, 1L, 1L, 1L), .Label = c("Inner Ring", "New Haven"), class = "factor")),
class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -12L))
Here's a working bit that yields my desired output:
race_pops %>%
gather(key = measure, value = value, est, moe) %>%
unite("grp2", race, measure, sep = "_") %>%
spread(key = grp2, value = value) %>%
gather(key = grp2, value = value, -town, -region, -starts_with("Total")) %>%
head(10)
#> # A tibble: 10 x 6
#> town region Total_est Total_moe grp2 value
#> <chr> <fct> <dbl> <dbl> <chr> <dbl>
#> 1 Hamden Inner Ring 61476 31 Black_est 13209
#> 2 New Haven New Haven 130405 60 Black_est 42970
#> 3 West Haven Inner Ring 54972 42 Black_est 10677
#> 4 Hamden Inner Ring 61476 31 Black_moe 998
#> 5 New Haven New Haven 130405 60 Black_moe 1383
#> 6 West Haven Inner Ring 54972 42 Black_moe 1119
#> 7 Hamden Inner Ring 61476 31 Latino_est 6450
#> 8 New Haven New Haven 130405 60 Latino_est 37231
#> 9 West Haven Inner Ring 54972 42 Latino_est 10977
#> 10 Hamden Inner Ring 61476 31 Latino_moe 879
This is the function up to the point where I get the error:
gather_grp <- function(df, grp = group, value = est, moe = moe, ...) {
name_vars <- quos(...)
grp_var <- enquo(grp)
value_var <- enquo(value)
moe_var <- enquo(moe)
df %>%
gather(key = measure, value = value, -(!!!name_vars), -(!!grp_var)) %>%
unite("grp2", !!grp_var, measure, sep = "_") %>%
spread(key = grp2, value = value) %>%
gather(key = grp2, value = value, -(!!!name_vars), -starts_with("Total"))
}
The function works if I drop region
and use just the single column town
:
race_pops %>%
select(-region) %>%
gather_grp(grp = race, value = est, moe = moe, town) %>%
head(10)
#> # A tibble: 10 x 5
#> town Total_est Total_moe grp2 value
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 Hamden 61476 31 Black_est 13209
#> 2 New Haven 130405 60 Black_est 42970
#> 3 West Haven 54972 42 Black_est 10677
#> 4 Hamden 61476 31 Black_moe 998
#> 5 New Haven 130405 60 Black_moe 1383
#> 6 West Haven 54972 42 Black_moe 1119
#> 7 Hamden 61476 31 Latino_est 6450
#> 8 New Haven 130405 60 Latino_est 37231
#> 9 West Haven 54972 42 Latino_est 10977
#> 10 Hamden 61476 31 Latino_moe 879
But I can't supply both town
and region
to the ...
:
race_pops %>%
gather_grp(grp = race, value = est, moe = moe, town, region)
#> Error in (~town): 2 arguments passed to '(' which requires 1
Created on 2018-05-08 by the reprex package (v0.2.0).
Thanks in advance!
Upvotes: 3
Views: 241
Reputation: 887108
We can wrap with c
and it should work
gather_grp <- function(df, grp = group, value = est, moe = moe, ...) {
name_vars <- quos(...)
grp_var <- enquo(grp)
value_var <- enquo(value)
moe_var <- enquo(moe)
df %>%
gather(key = measure, value = value, -c(!!!name_vars), -!!grp_var) %>%
unite("grp2", !!grp_var, measure, sep = "_") %>%
spread(key = grp2, value = value) %>%
gather(key = grp2, value = value, -c(!!!name_vars), -starts_with("Total"))
}
-running the function
race_pops %>%
gather_grp(grp = race, value = est, moe = moe, town, region)
# A tibble: 18 x 6
# town region Total_est Total_moe grp2 value
# <chr> <fct> <dbl> <dbl> <chr> <dbl>
# 1 Hamden Inner Ring 61476 31 Black_est 13209
# 2 New Haven New Haven 130405 60 Black_est 42970
# 3 West Haven Inner Ring 54972 42 Black_est 10677
# 4 Hamden Inner Ring 61476 31 Black_moe 998
# 5 New Haven New Haven 130405 60 Black_moe 1383
# 6 West Haven Inner Ring 54972 42 Black_moe 1119
# 7 Hamden Inner Ring 61476 31 Latino_est 6450
# 8 New Haven New Haven 130405 60 Latino_est 37231
# 9 West Haven Inner Ring 54972 42 Latino_est 10977
#10 Hamden Inner Ring 61476 31 Latino_moe 879
#11 New Haven New Haven 130405 60 Latino_moe 1688
#12 West Haven Inner Ring 54972 42 Latino_moe 1032
#13 Hamden Inner Ring 61476 31 White_est 37043
#14 New Haven New Haven 130405 60 White_est 40164
#15 West Haven Inner Ring 54972 42 White_est 28864
#16 Hamden Inner Ring 61476 31 White_moe 1039
#17 New Haven New Haven 130405 60 White_moe 1395
#18 West Haven Inner Ring 54972 42 White_moe 1226
For the single column case, we need to select
out the 'region' or 'town' as it will also be a column in the dataset (or that needs to be changed in the function)
race_pops %>%
dplyr::select(-region) %>%
gather_grp(grp = race, value = est, moe = moe, town)
Upvotes: 4