R: Merging rows by duplicates in first column

Question

I have a large dataset with duplicated values in the first column, like so:

ID         date      var1   var2
person1    052016    509    1678  
person2    122016    301    NA
person1    072016    NA     45

I want to combine the IDs and to take the most recent value by "date", and if it`s NA - to take the last value that it's not NA. The output should be like this:

ID         date      var1   var2 
person2    122016    301    NA
person1    072016    509    45

I have tried with this, but it didn't worked.

library(dplyr)

data %>% group_by(ID) %>% summarise_all(funs(max(data$date))) %>% funs(first(.[!is.na(.)]))

What should I use to apply a working code to the whole dataset?

www · Accepted Answer

A solution using dplyr.

library(dplyr)

dat2 <- dat %>%
  arrange(ID, desc(date)) %>%
  group_by(ID) %>%
  summarise_all(funs(first(.[!is.na(.)]))) %>%
  ungroup()
dat2
# # A tibble: 2 x 4
#   ID        date  var1  var2
#         
# 1 person1  72016   509    45
# 2 person2 122016   301    NA

DATA

dat <- read.table(text = "ID         date      var1   var2
person1    '052016'    509    1678  
person2    '122016'    301    NA
person1    '072016'    NA     45",
                  header = TRUE, stringsAsFactors = FALSE)

R: Merging rows by duplicates in first column

Answers (2)

Related Questions