Reputation: 961
I'd really appreciate it if anyone could help out with the task below. I've searched the R topics but haven't found anything to specifically address this. I've also tried which, table but not really getting anywhere with it.
x_1 <- c("A1", "A1", "B10", "B10", "B10", "C100", "C100", "C100")
y_1 <- as.Date(c("2010-11-17", "2010-11-17", "2010-11-17", "2009-02-14",
"2004-10-20", "2008-03-21", "2010-09-30", "2004-06-22"))
A <- data.frame(x_1, y_1)
x_2 <- c("A1", "B10", "B10", "B10","B10", "C100", "C100", "C100")
y_2 <- as.Date(c("2013-07-23", "2012-01-01", "2011-08-29", "2011-10-20",
"2010-08-10", "2012-06-21", "2013-08-27", "2012-11-25"))
B <- data.frame(x_2, y_2)
A
and B
are two data frames I'm working with. I need to construct a new data frame from A
using the numbering/order of names as given in B
(i.e. x_2
). The new data frame will consist of the names in x_2
together with the earliest dates in each category from y_1
(for example, earliest date in "B10"
category from A
is 2004-10-20
, etc.) such that I end up with a new data frame A2
given below:
x_3 <- c("A1", "B10", "B10", "B10", "B10", "C100", "C100", "C100")
y_3 <- as.Date(c("2010-11-17","2004-10-20", "2004-10-20", "2004-10-20",
"2004-10-20","2004-06-22", "2004-06-22", "2004-06-22" ) )
A2 <- data.frame(x_3, y_3 )
My intention is to subtract the dates in A2
from B
to get a figure of duration in days. The above is a very simplified version of my real data. The data I'm working on is about 3000 rows.
Any help would be much appreciated and please let me know if anything needs clarification. I'm using version 2.15.2 of R in MAC.
Thanks
Upvotes: 2
Views: 179
Reputation: 5239
Here's one way to create your A2
:
C <- A[order(A$x_1,A$y_1),]
A2 <- C[match(B$x_2,C$x_1),]
Using order()
will make sure that the oldest date is first. Using match()
matches the first occurence of the variable so it will only pick up the oldest date.
Upvotes: 2
Reputation: 37734
I think it would suffice to simply get the min for each in A using aggregate, and then use B to choose the needed rows.
aggregate(A["y_1"], A["x_1"], min)[B$x_2,]
## x_1 y_1
## 1 A1 2010-11-17
## 2 B10 2004-10-20
## 2.1 B10 2004-10-20
## 2.2 B10 2004-10-20
## 2.3 B10 2004-10-20
## 3 C100 2004-06-22
## 3.1 C100 2004-06-22
## 3.2 C100 2004-06-22
Upvotes: 1
Reputation: 43245
Using the data.table
package, this is trivial.
library(data.table)
A <- data.table(A)
B <- data.table(B)
setkey(B, x_1)
firsts <- A[, head(sort(y_1), 1), keyby=x_1]
Merge the two together:
B[firsts]
# x_2 y_2 V1
# 1: A1 2013-07-23 2010-11-17
# 2: B10 2012-01-01 2004-10-20
# 3: B10 2011-08-29 2004-10-20
# 4: B10 2011-10-20 2004-10-20
# 5: B10 2010-08-10 2004-10-20
# 6: C100 2012-06-21 2004-06-22
# 7: C100 2013-08-27 2004-06-22
# 8: C100 2012-11-25 2004-06-22
Upvotes: 2