Reputation: 2538
df <- data.frame(group=c(1,2,4,2,1,4,2,3,3),
ts=c("2014-02-13","2014-06-01","2014-02-14","2014-02-11","2013-02-01","2014-02-02","2014-03-21","2014-12-01","2014-02-11"),
letter=letters[1:9])
df$ts <- as.Date(df$ts,format='%Y-%m-%d')
I want to find an operation that will produce the complete rows containing the minimum timestamp per group, in this case,
group ts letter
1 2013-02-01 e
4 2014-02-02 f
2 2014-02-11 d
3 2014-02-11 i
A quick and dirty (and slow) base R solution would be
dfo <- data.frame(df[order(df$ts,decreasing=F),],index=seq(1:nrow(df)))
mins <- tapply(dfo$index,dfo$group,min)
dfo[dfo$index %in% mins,]
Intuitively, I think if there was a way to add an order index by group then I could just filter to where that column's value is 1, but I'm not sure how to execute it without lots of subsetting and rejoining.
Upvotes: 1
Views: 1592
Reputation: 1707
Here's a one-liner using base R.
df[sapply(split(df,df$group), function(x) row.names(x)[which.min(x$ts)] ),]
Breaking it down some:
list.by.group <- split(df,df$group)
#a vector of the row names corresponding to the earliest date in each group
names.of.which.min <- sapply(list.by.group, function(x) row.names(x)[which.min(x$ts)])
#subset the data frame by row name
df[names.of.which.min,]
Upvotes: 0
Reputation: 99331
You could use dplyr
library(dplyr)
group_by(df, group) %>% summarise(min = min(ts), letter = letter[which.min(ts)])
# group min letter
# 1 1 2013-02-01 e
# 2 2 2014-02-11 d
# 3 3 2014-02-11 i
# 4 4 2014-02-02 f
You could also slice
the ranked rows
group_by(df, group) %>%
mutate(rank = row_number(ts)) %>%
arrange(rank) %>%
slice(1)
Upvotes: 5
Reputation: 59355
Here's a data.table solution. You seem to want the result orders by ts
, not group
. THis does that.
library(data.table)
setDT(df)[,.SD[which.min(ts)],by=group][order(ts)]
# group ts letter
# 1: 1 2013-02-01 e
# 2: 4 2014-02-02 f
# 3: 2 2014-02-11 d
# 4: 3 2014-02-11 i
Upvotes: 3