jtanman
jtanman

Reputation: 684

Persistent Mutate Error Caused by Previous group_by/summarise

I'm seeing a mutate error caused by unrelated code that can either cause errors to be thrown or not based on unrelated code being run. For example,

library(tidyverse)
library(scales)
#> 
#> Attaching package: 'scales'
#> The following object is masked from 'package:purrr':
#> 
#>     discard
#> The following object is masked from 'package:readr':
#> 
#>     col_factor

df_test <- tibble(group = c('a', 'a', 'b', 'b', 'b'), hour=parse_factor(as.character(c(1, 2, 1, 2, 1))), x = c(1,2,3, 4, 5), y=c(5, 6, 7, 8, 9))

return_data <- df_test %>%
  dplyr::mutate(
    hour = paste(hour, ':00'),
    across(.cols = c(x, y), scales::label_dollar())
  )

summarise_df_input <- function(.data, func, group_vars) {
  df_agg <- .data %>%
    group_by(across(all_of(group_vars))) %>%
    summarise(across(everything(), func))
  
  return(df_agg)
}

df_grouped <- df_test %>% summarise_df_input(mean, 'group')
#> Warning in mean.default(hour): argument is not numeric or logical: returning NA

#> Warning in mean.default(hour): argument is not numeric or logical: returning NA

return_data <- df_test %>%
  dplyr::mutate(
    hour = paste(hour, ':00'),
    across(.cols = c(x, y), scales::label_dollar())
  )
#> Error: Problem with `mutate()` input `..2`.
#> x subscript out of bounds
#> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

return_data <- df_test %>%
  dplyr::mutate(
    across(.cols = c(x, y), scales::label_dollar())
  )

return_data <- df_test %>%
  dplyr::mutate(
    hour = paste(hour, ':00'),
    across(.cols = c(x, y), scales::label_dollar())
  )

Created on 2021-03-05 by the reprex package (v1.0.0)

Here is a reprex of what's going on. Does anyone have any idea what's happening? This is using dplyr 1.0.5 btw.

Upvotes: 1

Views: 345

Answers (1)

akrun
akrun

Reputation: 887711

it is using everything, instead it should be where(is.numeric) because other than the group column, there is an 'hour' column which is factor and mean works on numeric variables

summarise_df_input <- function(.data, func, group_vars) {
  .data %>%
      group_by(across(all_of(group_vars))) %>%
       summarise(across(where(is.numeric), func), .groups = 'drop')  
  
}

-testing

df_test %>% 
    summarise_df_input(mean, 'group')
# A tibble: 2 x 3
#  group     x     y
#* <chr> <dbl> <dbl>
#1 a       1.5   5.5
#2 b       4     8 

Regarding the error in execution, it may be a bug. Changing the order of execution of across can bypass the error

return_data <- df_test %>%
    dplyr::mutate(
       across(.cols = c(x, y), scales::label_dollar()), hour = paste(hour, ':00')
  )

Upvotes: 0

Related Questions