Reputation: 11
Trying to perform the basic Summarise() function but getting the same error again and again!
I have a large number of csv files having 4 columns. I am reading them into R using lapply and rbinding them. Next I need to see the number of complete observations present for each ID.
Error:
*Problem with `summarise()` input `complete_cases`.
x unused argument (Date)
i Input `complete_cases` is `n(Date)`.
i The error occured in group 1: ID = 1.*
Code:
library(dplyr)
merged <-do.call(rbind,lapply(list.files(),read.csv))
merged <- as.data.frame(merged)
remove_na <- merged[complete.cases(merged),]
new_data <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n(Date))
Here is what the data looks like
Upvotes: 1
Views: 75
Reputation: 8494
The problem is not coming from summarise
but from n
.
If you look at the help ?n
, you will see that n
is used without any argument, like this:
new_data_count <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n())
This will count the number of rows for each ID
group and is independent from the Date
column. You could also use the shortcut function count
:
new_data_count <- remove_na %>% count(ID)
If you want to count the different Date
values, you might want to use n_distinct
:
new_data_count_dates <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n_distinct(Date))
Of note, you could have written your code with purrr::map
, which has better functions than _apply
as you can specify the return type with the suffix. This could look like this:
library(purrr)
remove_na = map_dfr(list.files(), read.csv) %>% na.omit()
Here, map_dfr
returns a data.frame with binding rows, but you could have used map_dfc
which returns a data.frame with binding columns.
Upvotes: 1