Silviculturalist
Silviculturalist

Reputation: 199

Quasiquotation in purrr

As part of a larger function to only retain values in a time series of plant growth which occur before an injury for each individual (plantid), I'm writing 2 chunks, which, in order, will contain a function

  1. Control that all variables given in an argument are character vectors (as in the second function, %in% doesn't recognised the named factors), and if not, convert to a character while providing a warning.

  2. Identify and mark rows from the above given variables which include one of the strings from argument b.

I'm quite sure I'm getting something wrong with the quotation/quasiquotation or bang-bang (!!)/big-bang (!!!) operators (this is my first time writing a function with quotation). I'm consistently given the “!!! may not be used at top-level” warnings, or the like, which I'm not sure how to solve. I also need help finding a good way to try to convert the variables which aren't characters.

This is what I've got so far

Argument description

Function

id_injured <- function(df, plantid, year, injuries, forbidden_values){
    #parsing unquoted strings.
    plantid <- enquo(plantid)
    year <- enquo(year)
    forbidden_values <- enquos(forbidden_values)
    injuries <- syms(injuries)

    #if all variables in injuries are not characters, stop and warn (attempt to convert to character those variables which are not character)

    if(!all(purrr::pmap_int(select(df, !!!injuries), ~is.character(...))))){
       stop("All injury variables are not characters. Convert factors in injuries to character variables")} else {
          (1) #Control to give output while testing function, replace with conversion and warning?
    }

    #Identify rows with matching injury codes with 1, else 0.
    Dataplantid <- df %>% mutate(is_injured = purrr::pmap_int(select(df, !!!injuries), any(c(...) %in% !!!forbidden values)))

    #End of function
}


Intended use

I've removed part (1) of the function so that it will only try to mark 1 or 0.

Dataplantid <- id_injured(df=df, plantid=plantid, year=year, injuries=c("PrimaryInjury","SecondaryInjury","OtherInjury"),forbidden_values=c("Rust","Insect","Snow break")

Result

Error: Can't use !!! at top level.

> last_trace()
<error/rlang_error>
Can't use `!!!` at top level.
Backtrace:
     █
  1. └─global::so_injured(...)
  2.   └─`%>%`(...)
  3.     ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  4.     └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5.       └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
  6.         └─`_fseq`(`_lhs`)
  7.           └─magrittr::freduce(value, `_function_list`)
  8.             ├─base::withVisible(function_list[[k]](value))
  9.             └─function_list[[k]](value)
 10.               ├─dplyr::mutate(...)
 11.               └─dplyr:::mutate.data.frame(...)
 12.                 ├─base::as.data.frame(mutate(tbl_df(.data), ...))
 13.                 ├─dplyr::mutate(tbl_df(.data), ...)
 14.                 └─dplyr:::mutate.tbl_df(tbl_df(.data), ...)
 15.                   └─rlang::enquos(..., .named = TRUE)
 16.                     └─rlang:::endots(...)
 17.                       └─rlang:::map(...)
 18.                         └─base::lapply(.x, .f, ...)
 19.                           └─rlang:::FUN(X[[i]], ...)
 20.                             └─rlang::splice(...)

Associated data

plantid <- rep(c(1,2,3,4,5), times=c(3,3,3,3,3))
year <- rep(1:3, length.out=length(plantid))
set.seed(42)
PrimaryInjury <- sample(c(NA,NA,NA,"Rust","Insect", "Snow break"), 15, replace=TRUE)
SecondaryInjury <- rep(NA, length.out=length(plantid)) #Filled with NA for example
OtherInjury <- rep(NA, length.out=length(plantid)) #Filled NA for example
df <- data.frame(plantid,year,PrimaryInjury,SecondaryInjury,OtherInjury)
#Right now, PrimaryInjury is a factor, SecondaryInjury and OtherInjury are logical.

Expected output

Dataplantid <- df
Dataplantid$is_injured <- c(0,1,0,0,0,1,0,0,0,1,0,1,1,1,0)

Upvotes: 1

Views: 173

Answers (1)

Konrad Rudolph
Konrad Rudolph

Reputation: 546053

There are a few problems, in order from least to most problematic:

  1. Use map_lgl instead of map_int for logical results.
  2. In particular, use map_lgl instead of pmap_int unless you actually intend to map across multiple arguments in parallel, which is not the case here.
  3. Do not assign the function result to a variable inside the function. It doesn’t really harm but it’s unnecessary and misleading.
  4. Do not enquote and then interpolate the forbidden_values values. You want to use a character vector here, not R names.
  5. You were missing a ~ in the purrr call to calculate is_injured.
  6. The logic to identify the injured values does not quite work like this; there may be a way of using pmap_lgl here but I think it’s more straightforward — albeit possibly more verbose — to reshape your data into long format, and work with that.

Put together, we get:

id_injured <- function(df, plantid, year, injuries, forbidden_values) {
    plantid <- enquo(plantid)
    year <- enquo(year)
    injuries <- syms(injuries)

    df_injuries <- select(df, !!! injuries)

    if (! all(purrr::map_lgl(df_injuries, is.character))) {
        stop("All injury variables are not characters. Convert factors in injuries to character variables")
    }

    is_injured <- df_injuries %>%
        mutate(.RowID = row_number()) %>%
        tidyr::gather(Key, Value, -.RowID) %>%
        group_by(.RowID) %>%
        summarize(is_injured = any(Value %in% forbidden_values)) %>%
        pull(is_injured)

    df %>% mutate(is_injured = is_injured)
}

Upvotes: 1

Related Questions