Sparsh Goyal
Sparsh Goyal

Reputation: 11

R version 3.6.3 (2020-02-29) | Using package dplyr_1.0.0 | Unable to perform summarise() function

Trying to perform the basic Summarise() function but getting the same error again and again!

I have a large number of csv files having 4 columns. I am reading them into R using lapply and rbinding them. Next I need to see the number of complete observations present for each ID.

Error:

 *Problem with `summarise()` input `complete_cases`.
    x unused argument (Date)
    i Input `complete_cases` is `n(Date)`.
    i The error occured in group 1: ID = 1.*

Code:

library(dplyr)
merged <-do.call(rbind,lapply(list.files(),read.csv))
merged <- as.data.frame(merged)
remove_na <- merged[complete.cases(merged),]
new_data <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n(Date))

Here is what the data looks like

Upvotes: 1

Views: 75

Answers (1)

Dan Chaltiel
Dan Chaltiel

Reputation: 8494

The problem is not coming from summarise but from n.

If you look at the help ?n, you will see that n is used without any argument, like this:

new_data_count <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n())

This will count the number of rows for each ID group and is independent from the Date column. You could also use the shortcut function count:

new_data_count <- remove_na %>% count(ID)

If you want to count the different Date values, you might want to use n_distinct:

new_data_count_dates <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n_distinct(Date))

Of note, you could have written your code with purrr::map, which has better functions than _apply as you can specify the return type with the suffix. This could look like this:

library(purrr)
remove_na = map_dfr(list.files(), read.csv) %>% na.omit()

Here, map_dfr returns a data.frame with binding rows, but you could have used map_dfc which returns a data.frame with binding columns.

Upvotes: 1

Related Questions