Reputation: 1359
I have date that looks like the following:
dat<-data.frame(ID=c("A","B","B",NA,"C"),Date=as.Date(c("2012-06-06","2012-07-07","2014-07-07",NA,NA)),stringsAsFactors=FALSE)
print(dat)
ID Date
A 2012-06-06
B 2012-07-07
B 2014-07-07
<NA> <NA>
C <NA>
I am trying to retain the most recent instance of ID
without removing any NAs to get something like:
dat1<-data.frame(ID=c("A","B",NA,"C"),Date=as.Date(c("2012-06-06","2014-07-07",NA,NA)),stringsAsFactors=FALSE)
print(dat1)
ID Date
A 2012-06-06
B 2014-07-07
<NA> <NA>
C <NA>
I have tried the following from dplyr
:
library(dplyr)
dat1<-dat%>%group_by(ID)%>%filter(Date==max(Date&!is.na(Date)))
dat1<-dat%>%group_by(ID)%>%filter(Date==max(Date,na.rm=TRUE))
The first yields an error and the second still removes NAs. Any suggestions?
Upvotes: 1
Views: 61
Reputation: 13570
Base
dat$ID <- addNA(dat$ID)
dat <- dat[order(dat$Date, decreasing = TRUE),]
aggregate( Date ~ID, dat , FUN = head, 1, na.action = na.pass)
dplyr
Using slice
in dplyr
is pretty neat:
dat %>%
group_by(ID) %>%
arrange(desc(Date)) %>%
slice(1)
Output
# A tibble: 4 x 2
# Groups: ID [4]
ID Date
<chr> <date>
1 A 2012-06-06
2 B 2014-07-07
3 C NA
4 NA NA
Upvotes: 1
Reputation: 5017
An easy solution:
dat<-dat[order(as.Date(dat$Date),na.last = T,decreasing = T),]
dat<-dat[!duplicated(dat$ID), ]
dat[ order(row.names(dat)), ]
ID Date
1 A 2012-06-06
3 B 2014-07-07
4 <NA> <NA>
5 C <NA>
Upvotes: 1
Reputation: 20095
You can try:
library(dplyr)
dat %>%
group_by(ID) %>%
mutate(latest = ifelse(Date == max(Date), 1L, 0L)) %>%
filter(is.na(latest) | latest == 1) %>%
select( -latest)
Result:
# A tibble: 4 x 3
# Groups: ID [4]
ID Date
<chr> <date>
1 A 2012-06-06
2 B 2014-07-07
3 <NA> NA
4 C NA
Upvotes: 0
Reputation: 374
Using dplyr
:
dat <-
data.frame(
ID = c("A", "B", "B", NA, "C"),
Date = as.Date(c(
"2012-06-06", "2012-07-07", "2014-07-07", NA, NA
)),
stringsAsFactors = FALSE
)
df <- dat %>%
arrange(ID, desc(Date)) %>%
group_by(ID) %>%
filter(row_number() == 1)
Output:
# A tibble: 4 x 2
ID Date
<chr> <date>
1 A 2012-06-06
2 B 2014-07-07
3 C NA
4 <NA> NA
Upvotes: 0
Reputation: 2678
Using data.table
:
library(data.table)
setDT(dat)
dat[, max_date := max(Date), by = ID]
dat <- dat[!(is.na(Date)) & Date == max_date | is.na(Date), ]
dat[, max_date := NULL]
Output:
ID Date
1: A 2012-06-06
2: B 2014-07-07
3: NA <NA>
4: C <NA>
Upvotes: 3