felixmc
felixmc

Reputation: 518

Subset data.table by evaluating multiple columns

How to return 1 row for each unique name by most recent (latest) Type?

DataTable with 6 rows:

example <- data.table(c("Bob","May","Sue","Bob","Sue","Bob"), 
                      c("A","A","A","A","B","B"),
              as.Date(c("2010/01/01", "2010/01/01", "2010/01/01", 
                   "2012/01/01", "2012/01/11", "2014/01/01")))
setnames(example,c("Name","Type","Date"))
setkey(example,Name,Date)

Should return 5 rows:

# 1:  Bob    A 2012-01-01
# 2:  Bob    B 2014-01-01
# 3:  May    A 2010-01-01
# 4:  Sue    A 2010-01-01
# 5:  Sue    B 2012-01-11

Upvotes: 0

Views: 811

Answers (2)

Arun
Arun

Reputation: 118779

Since you've already sorted by Name and Date, you can use unique (which calls unique.data.table) function on the columns Name and Type, with fromLast = TRUE.

require(data.table) ## >= v1.9.3
unique(example, by=c("Name", "Type"), fromLast=TRUE)
#    Name Type       Date
# 1:  Bob    A 2012-01-01
# 2:  Bob    B 2014-01-01
# 3:  May    A 2010-01-01
# 4:  Sue    A 2010-01-01
# 5:  Sue    B 2012-01-11

This'll pick the last row for each Name,Type group. Hope this helps.

PS: As @mso points out, this needs 1.9.3 because the fromLast argument was implemented only in 1.9.3 (available from github).

Upvotes: 3

rnso
rnso

Reputation: 24535

Following versions of @Arun answer work:

unique(example[rev(order(Name,Date))], by=c("Name", "Type"), fromLast=TRUE)[order(Name,Date)]
   Name Type       Date
1:  Bob    A 2012-01-01
2:  Bob    B 2014-01-01
3:  May    A 2010-01-01
4:  Sue    A 2010-01-01
5:  Sue    B 2012-01-11

unique(example[order(Name, Date, decreasing=T)], by=c("Name","Type"))[order(Name, Date)]
   Name Type       Date
1:  Bob    A 2012-01-01
2:  Bob    B 2014-01-01
3:  May    A 2010-01-01
4:  Sue    A 2010-01-01
5:  Sue    B 2012-01-11

Upvotes: 1

Related Questions