EmotionResearcher
EmotionResearcher

Reputation: 65

Using mutate in dplyr on grouped data in custom function in R with a dataframe and columns as arguments

I created a custom function in R for preparing my data for plots. I'm passing a dataframe and two columns (from that dataframe) to my function then using dplyr. The function needs to group by a categorical variable (in this case, age.group) and while the data is still grouped, create a binned version of a continuous variable (to.be.binned) AND get the count for that group. I tried to accomplish both using mutate.

The code within this function works outside of a function, but I'm passing both a dataframe and variables to the function (using the curly brackets since it's dplyr).

I get the following error:

Error: Column `"age.group"` can't be modified because it's a grouping variable

I don't think my code does anything to modify this variable. I need the count by group in order to get percentages for each group, so I can't ungroup first (which was the suggestion to others getting this same error).

Any suggestions would be deeply appreciated!

Reprex:

library(tidyverse)

simple.df <- data.frame(
  age.group = c("18-30","Under 18","Over 30",
                "Over 30","Over 30","Under 18","18-30","18-30",
                "Over 30","Under 18","18-30","18-30","18-30","18-30",
                "Under 18","18-30","Under 18","18-30","Under 18",
                "Under 18","Under 18","Over 30","Over 30","Over 30",
                "Over 30","Over 30","18-30","Under 18","Over 30",
                "Under 18"),
  to.be.binned = c(98.415794,32.35116,73.29943,
                   81.92012,99.61144,29.665798,97.652885,94.94358,
                   77.798035,24.110243,99.110245,98.415794,99.80469,94.24913,
                   79.665794,98.415794,72.02691,96.332466,94.94358,
                   97.02691,97.02691,92.860245,98.415794,97.02691,
                   90.082466,99.110245,99.80469,98.415794,99.55236,99.110245)
)



bin_by_group <- function(df, my.grouping, bin.this) {
  
  bw = 25
  
  new.df <- df %>%
    group_by({{my.grouping}}) %>%
    mutate(this.binned = cut(as.numeric({{bin.this}}),
                             breaks = seq(0, 100, bw),
                             labels = seq(0 + bw, 100, bw)-(bw/2)),
           n = n()) %>%
    group_by({{my.grouping}}, this.binned) %>%
    summarise(p = n()/n[1]) %>%
    ungroup() %>%
    mutate(this.binned = as.numeric(as.character(this.binned)))
  
  return(new.df)
  
}


test.df <- bin_by_group(simple.df, "age.group", "to.be.binned")
#> Warning in cut(as.numeric(~"to.be.binned"), breaks = seq(0, 100, bw), labels =
#> seq(0 + : NAs introduced by coercion
#> Error: Column `"age.group"` can't be modified because it's a grouping variable

Upvotes: 2

Views: 513

Answers (1)

akrun
akrun

Reputation: 886938

It is just that we need unquoted arguments to be passed as the {{}} expects it to be unquoted because {{}} is equivalent to enquo + !!.

bin_by_group(simple.df, age.group, to.be.binned)

-output

# A tibble: 7 x 3
#  age.group this.binned     p
#  <chr>           <dbl> <dbl>
#1 18-30            87.5   1  
#2 Over 30          62.5   0.1
#3 Over 30          87.5   0.9
#4 Under 18         12.5   0.1
#5 Under 18         37.5   0.2
#6 Under 18         62.5   0.1
#7 Under 18         87.5   0.6

if we want to pass either quoted or unquoted, use ensym to do the conversion and then evaluate (!!)

bin_by_group <- function(df, my.grouping, bin.this) {
  
  bw = 25
  my.grouping <- ensym(my.grouping)
  bin.this <- ensym(bin.this)
  new.df <- df %>%
    group_by(!! my.grouping) %>%
    mutate(this.binned = cut(as.numeric(!!bin.this),
                             breaks = seq(0, 100, bw),
                             labels = seq(0 + bw, 100, bw)-(bw/2)),
           n = n()) %>%
    group_by(!! my.grouping, this.binned) %>%
    summarise(p = n()/n[1], .groups = 'drop') %>%
    ungroup() %>%
    mutate(this.binned = as.numeric(as.character(this.binned)))
  
  return(new.df)
  
}

-testing

 bin_by_group(simple.df, "age.group", "to.be.binned")
# A tibble: 7 x 3
  age.group this.binned     p
  <chr>           <dbl> <dbl>
1 18-30            87.5   1  
2 Over 30          62.5   0.1
3 Over 30          87.5   0.9
4 Under 18         12.5   0.1
5 Under 18         37.5   0.2
6 Under 18         62.5   0.1
7 Under 18         87.5   0.6

bin_by_group(simple.df, age.group, to.be.binned)
# A tibble: 7 x 3
  age.group this.binned     p
  <chr>           <dbl> <dbl>
1 18-30            87.5   1  
2 Over 30          62.5   0.1
3 Over 30          87.5   0.9
4 Under 18         12.5   0.1
5 Under 18         37.5   0.2
6 Under 18         62.5   0.1
7 Under 18         87.5   0.6

Upvotes: 2

Related Questions