Jack Neath
Jack Neath

Reputation: 3

R Studio error: `n()` must only be used inside dplyr verbs

install.packages(c("ggplot2", "ggpubr", "tidyverse", "broom", "AICcmodavg", "dplyr"))
library(ggplot2)
library(ggpubr)
library(tidyverse)
library(broom)
library(AICcmodavg)

library(dplyr)

my_data <- read.table(file = "clipboard", 
                      sep = "\t", header=TRUE)


group_by(my_data, group) %>%
  suppressWarnings(as.numeric(co2))  
***dplyr::summarise (count = dplyr::n(), mean = dplyr::mean("CO2", na.rm = TRUE),sd = dplyr::sd("CO2", na.rm = TRUE))***

Working towards a one-way ANOVA test on some data I have for my dissertation but the last line (marked with *** above ***) keeps returning the following error code:

Error: n() must only be used inside dplyr verbs.

What could be going wrong? Thanks for any help you can give!

Upvotes: 0

Views: 1853

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226162

The error is admittedly a little bit opaque, but we can replicate it as simply as

summarise(count = n())

Since you haven't piped the results of the first few lines to the final summarise(), your summarise() call is essentially being read as a standalone function, with a missing data argument (summarise(mtcars, count = n()) works OK).

You might be looking for a pipeline like this (illustrated with built-in data set mtcars):

library(tidyverse)
group_by(mtcars, cyl) %>%
  mutate(across(mpg, as.numeric)) %>%
  summarise(count = n(), 
            mean = mean(mpg, na.rm = TRUE),
            sd = sd(mpg, na.rm = TRUE)
           )

By the way, if you're doing an ANOVA you probably don't want to collapse the data to summary statistics first (unless someone is making you do the computations by hand); anova(lm(...)) is more natural, where you apply the lm() to non-aggregated data.

Upvotes: 1

Related Questions