Count repeated rows and the first non-NA appearance in a data frame

Question

I have the following sample dataset:

library(tidyverse)
dataset <- data.frame(id = c("A","A","B","B","C","A","C","B"), 
                  value = c(100, 500, 200, 100, 500, 300, 400, 100), 
                  status = c(NA, "Valid", NA, NA, "Pend", NA, NA, NA), 
                  stringsAsFactors = FALSE)

What I need is to extract this unique ids with the highest value and have how much it repeats and the first non-NA status.

I have solved it in this way:

dataset_count <- dataset %>% group_by(id) %>% 
summarise(count = n(), comment = max(status, na.rm = TRUE)) %>% ungroup()

dataset_cross <- dataset %>% arrange(desc(value)) %>% 
left_join(dataset_count) %>% distinct(id, .keep_all = TRUE)

but since my original dataset has 120 variables and more rules to follow I would like to know if there is a way to make it more compact. For example I read about coalesce, but it doesn't allow me to extract the first NA in a grouped data. Please, could you give some advice? Thank you.

Ronak Shah · Accepted Answer

You could get max value using max, count number of rows using n() and first non-NA value with which.max for each id.

library(dplyr)

dataset %>%
  group_by(id) %>%
  summarise(value = max(value), 
            count = n(), 
            status = status[which.max(!is.na(status))])

#  id    value count status
#      
#1 A       500     3 Valid 
#2 B       200     3 NA    
#3 C       500     2 Pend

Count repeated rows and the first non-NA appearance in a data frame

Answers (2)

Related Questions