Reputation: 1673
After reading and re-reading the many "programing with dplyr" guides, I still cannot find a way to solve my particular case.
I understand that the use of group_by_
, mutate_
and such "string-friendly" versions of tidyverse functions is heading toward deprecation, and that enquo
is the way to go.
However, my case is somewhat different, and I'm struggling to find a neat way to solve it in a tidy way.
Indeed, my aim is to create and manipulate dataframes within a function. Creating (mutating) new variables based on others, using them, etc.
However, no matter how hard I try, my code either errors or returns some warnings upon package check, such as no visible binding for global variable ...
.
Here's a reproducible example:
Here's what I want to do:
df <- data.frame(X=c("A", "B", "C", "D", "E"),
Y=c(1, 2, 3, 1, 1))
new_df <- df %>%
group_by(Y) %>%
summarise(N=n()) %>%
mutate(Y=factor(Y, levels=1:5)) %>%
complete(Y, fill=list(N = 0)) %>%
arrange(Y) %>%
rename(newY=Y) %>%
mutate(Y=as.integer(newY))
Some common dplyr manipulations which expected result should be:
# A tibble: 5 x 3
newY N Y
<fctr> <dbl> <int>
1 1 3 1
2 2 1 2
3 3 1 3
4 4 0 4
5 5 0 5
I would like this piece of code to quietly work inside a function. The following was my best attempt to deal with the non-NSE issues:
myfunction <- function(){
df <- data.frame(X=c("A", "B", "C", "D", "E"),
Y=c(1, 2, 3, 1, 1))
new_df <- df %>%
group_by_("Y") %>%
summarise(!!"N":=n()) %>%
mutate(!!"Y":=factor(Y, levels=1:5)) %>%
complete_("Y", fill=list(N = 0)) %>%
arrange_("Y") %>%
rename(!!"newY":="Y") %>%
mutate(!!"Y":=as.integer(newY))
}
Unfortunately, I still got the following messages:
myfunction: no visible global function definition for ':='
myfunction: no visible binding for global variable 'Y'
myfunction: no visible binding for global variable 'newY'
Undefined global functions or variables:
:= Y n.Factors n_optimal newY
Is there a way to solve it? Thanks a lot!
EDIT: I'm using R 3.4.1, dplyr_0.7.4, tidyr_0.7.2 and tidyverse_1.1.1
Thanks to the comments I've managed to solve it, here's the working solution:
myfunction <- function(){
df <- data.frame(X=c("A", "B", "C", "D", "E"),
Y=c(1, 2, 3, 1, 1))
new_df <- df %>%
group_by_("Y") %>%
summarise_("N"=~n()) %>%
mutate_("Y"= ~factor(Y, levels=1:5)) %>%
complete_("Y", fill=list(N = 0)) %>%
arrange_("Y") %>%
rename_("newY"=~Y) %>%
mutate_("Y"=~as.integer(newY))
}
Thanks A LOT :)
Upvotes: 6
Views: 1370
Reputation: 119
The answer wasn't in the "programing with dplyr" guides because your issue is more general. Although your code deals with non-standard evaluation, your case does not need it. If you remove the code that deals with non-standard evaluation, you will reduce the number of problems you need to fix.
Still, some important issues remain -- issues of NAMESPACE. You deal with NAMESPACE anytime you use functions from other packages inside functions of your own package. NAMESPACE is not an easy topic, but if you are writing packages it will pay off to learn a bit. I recommend you to read: From r-pkgs.had.co.nz/namespace.html, find section "Imports" and read its introduction and also the subheading "R functions". That will help you understand the steps, code and comments that I post below.
Follow these steps to fix your problem:
- Add dplyr, magrittr, and tidyr to DESCRIPTION.
- Refer to functions as PACKAGE::FUNCTION()
.
- Remove all !!
and :=
because in this case you don't need them.
- Import and export the pipe from magrittr.
- Import .data from rlang.
- Pass global variables to utils::globalVariables().
- Rebuild, reload, recheck.
# I make your function shorter to focus on the important details.
myfunction <- function(){
df <- data.frame(
X = c("A", "B", "C", "D", "E"),
Y = c(1, 2, 3, 1, 1)
)
df %>%
dplyr::group_by(.data$Y) %>%
dplyr::summarise(N = n())
}
# Fix check() notes
#' @importFrom magrittr %>%
#' @export
magrittr::`%>%`
#' @importFrom rlang .data
NULL
utils::globalVariables(c(".data", "n"))
Upvotes: 4
Reputation: 808
You can use rlang::sym()
(or base::as.name()
) to convert characters to symbols, so let me add an alternatives answer.
Note that I don't mean to force you to throw away these deprecated functions. You can use which is easy to understand for you. (I believe sym()
is more useful, though)
rlang::sym()
This code
group_by_("Y") %>%
can be written as
group_by(!! rlang::sym("Y"))
or you can even assign the symbol to a variable beforehand.
col_Y <- rlang::sym("Y")
df %>%
group_by(!! col_Y)
This code is totally fine.
summarise(!!"N":=n())
Both characters and symbols are permitted for LHS. So this is also fine:
col_N <- rlang::sym("N")
# ...
summarise(!! col_N := n())
select()
and rename()
have the different semantics than other functions like mutate()
; it allows characters in addition to symbols. This may be a bit advanced topic. You can find more detailed explanation in a vignette.
More precisely, the code bellow are both permitted:
rename(new = old)
rename(new = "old")
So, this code is fine.
rename(!! "newY" := "Y")
reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2017-11-12
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
df <- data.frame(X=c("A", "B", "C", "D", "E"),
Y=c(1, 2, 3, 1, 1))
col_Y <- rlang::sym("Y")
col_N <- rlang::sym("N")
col_newY <- rlang::sym("newY")
df %>%
group_by(!! col_Y) %>%
summarise(!! col_N := n()) %>%
mutate(!! col_Y := factor(!! col_Y, levels=1:5)) %>%
complete(!! col_Y, fill = list(N = 0)) %>%
arrange(!! col_Y) %>%
rename(!! col_newY := !! col_Y) %>%
mutate(!! col_Y := as.integer(!! col_newY))
#> # A tibble: 5 x 3
#> newY N Y
#> <fctr> <dbl> <int>
#> 1 1 3 1
#> 2 2 1 2
#> 3 3 1 3
#> 4 4 0 4
#> 5 5 0 5
Upvotes: 1