Reputation: 1338
I would like to extract certain rows from a data frame containing one colum as Date (column C). Here is a small example:
The output should look like this:
Before <- data.frame(A=c("0010","0011","0012","0015","0024","0032","0032","0033","0039","0039","0039","0041","0054"),
B=c(11,12,11,11,12,12,12,11,"NA","NA",11,11,11),
C=c("2014-01-07","2013-06-03","2013-07-29","2014-07-14","2012-12-17","2013-08-21","2013-08-21","2014-07-11","2012-10-06","2012-10-06","2013-10-22","2014-05-28","2014-03-26"))
After <- data.frame(A=c("0010","0011","0012","0015","0024","0032","0033","0039","0041","0054"),
B=c(11,12,11,11,12,12,11,11,11,11),
C=c("2014-01-07","2013-06-03","2013-07-29","2014-07-14","2012-12-17","2013-08-21","2014-07-11","2013-10-22","2014-05-28","2014-03-26"))
So would I'm aiming for is:
I wasn't able to find a solution using subset, unique etc. Any help appreciated!
Upvotes: 0
Views: 198
Reputation: 118779
Here are two data.table
variations depending on the assumptions on data:
Assuming that your data already has the latest date for each group of A
as the last element:
require(data.table)
setDT(Before)[, .SD[.N], by=A]
.SD
holds the S
ubset of D
ata for each group in A
and .N
holds the number of observations in that group. So, .SD[.N]
gives us the last observation, for each group.
Without any assumptions:
require(data.table)
setDT(Before)[, C := as.Date(C)][, .SD[which.max(C)], by=A]
Here, first we replace C
with as.Date(C)
using data.table
's :=
operator which modifies by reference (without making any copy, hence fast+memory efficient). Then, for each A
data subset, we subset the row correspondng to the maximum value of C
.
HTH
Upvotes: 1
Reputation: 123
require(dplyr)
Before %>%
mutate(C=as.Date(C)) %>%
group_by(A) %>%
arrange(A,desc(C)) %>%
filter(row_number()==1)
#Source: local data frame [10 x 3]
#Groups: A
# A B C
#1 0010 11 2014-01-07
#2 0011 12 2013-06-03
#3 0012 11 2013-07-29
#4 0015 11 2014-07-14
#5 0024 12 2012-12-17
#6 0032 12 2013-08-21
#7 0033 11 2014-07-11
#8 0039 11 2013-10-22
#9 0041 11 2014-05-28
#10 0054 11 2014-03-26
Upvotes: 1
Reputation: 10223
By using the fact that dates act like numerics something like the following might do the trick:
Before$C <- as.Date(Before$C) # Convert to dates
ans <- aggregate(C ~ A + B, max, data = Before) # Aggregate date, choose the last date
ans <- ans[ans$B != "NA", ] # Remove NA in col B
print(ans)
# A B C
#1 0010 11 2014-01-07
#2 0012 11 2013-07-29
#3 0015 11 2014-07-14
#4 0033 11 2014-07-11
#5 0039 11 2013-10-22
#6 0041 11 2014-05-28
#7 0054 11 2014-03-26
#8 0011 12 2013-06-03
#9 0024 12 2012-12-17
#10 0032 12 2013-08-21
The max
of type Date
will return the most recent one.
Upvotes: 0
Reputation: 132626
split-apply-combine:
Before$C <- as.Date(Before$C)
library(plyr)
ddply(Before, .(A), function(df) {
df <- df[df$C==max(df$C),]
df[!duplicated(df),]
})
# A B C
#1 0010 11 2014-01-07
#2 0011 12 2013-06-03
#3 0012 11 2013-07-29
#4 0015 11 2014-07-14
#5 0024 12 2012-12-17
#6 0032 12 2013-08-21
#7 0033 11 2014-07-11
#8 0039 11 2013-10-22
#9 0041 11 2014-05-28
#10 0054 11 2014-03-26
Upvotes: 0