John_dydx
John_dydx

Reputation: 961

Manipulating data frames in R

I'd really appreciate it if anyone could help out with the task below. I've searched the R topics but haven't found anything to specifically address this. I've also tried which, table but not really getting anywhere with it.

x_1 <- c("A1", "A1", "B10", "B10", "B10", "C100", "C100", "C100")
y_1 <- as.Date(c("2010-11-17", "2010-11-17", "2010-11-17", "2009-02-14",
                 "2004-10-20", "2008-03-21", "2010-09-30", "2004-06-22"))
A <- data.frame(x_1, y_1)

x_2 <- c("A1", "B10", "B10", "B10","B10", "C100", "C100", "C100")
y_2 <- as.Date(c("2013-07-23", "2012-01-01", "2011-08-29", "2011-10-20",
                 "2010-08-10", "2012-06-21", "2013-08-27", "2012-11-25"))
B <- data.frame(x_2, y_2)

A and B are two data frames I'm working with. I need to construct a new data frame from A using the numbering/order of names as given in B (i.e. x_2). The new data frame will consist of the names in x_2 together with the earliest dates in each category from y_1 (for example, earliest date in "B10" category from A is 2004-10-20, etc.) such that I end up with a new data frame A2 given below:

x_3 <- c("A1", "B10", "B10", "B10", "B10", "C100", "C100", "C100")
y_3 <- as.Date(c("2010-11-17","2004-10-20", "2004-10-20", "2004-10-20",
                 "2004-10-20","2004-06-22", "2004-06-22", "2004-06-22" ) )
A2 <- data.frame(x_3, y_3 )

My intention is to subtract the dates in A2 from B to get a figure of duration in days. The above is a very simplified version of my real data. The data I'm working on is about 3000 rows.

Any help would be much appreciated and please let me know if anything needs clarification. I'm using version 2.15.2 of R in MAC.

Thanks

Upvotes: 2

Views: 179

Answers (3)

Sam Dickson
Sam Dickson

Reputation: 5239

Here's one way to create your A2:

C <- A[order(A$x_1,A$y_1),]
A2 <- C[match(B$x_2,C$x_1),]

Using order() will make sure that the oldest date is first. Using match() matches the first occurence of the variable so it will only pick up the oldest date.

Upvotes: 2

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37734

I think it would suffice to simply get the min for each in A using aggregate, and then use B to choose the needed rows.

aggregate(A["y_1"], A["x_1"], min)[B$x_2,]

##      x_1        y_1
## 1     A1 2010-11-17
## 2    B10 2004-10-20
## 2.1  B10 2004-10-20
## 2.2  B10 2004-10-20
## 2.3  B10 2004-10-20
## 3   C100 2004-06-22
## 3.1 C100 2004-06-22
## 3.2 C100 2004-06-22

Upvotes: 1

Justin
Justin

Reputation: 43245

Using the data.table package, this is trivial.

library(data.table)
A <- data.table(A)
B <- data.table(B)
setkey(B, x_1)

firsts <- A[, head(sort(y_1), 1), keyby=x_1]

Merge the two together:

B[firsts]

#     x_2        y_2         V1
# 1:   A1 2013-07-23 2010-11-17
# 2:  B10 2012-01-01 2004-10-20
# 3:  B10 2011-08-29 2004-10-20
# 4:  B10 2011-10-20 2004-10-20
# 5:  B10 2010-08-10 2004-10-20
# 6: C100 2012-06-21 2004-06-22
# 7: C100 2013-08-27 2004-06-22
# 8: C100 2012-11-25 2004-06-22

Upvotes: 2

Related Questions