The Gardener
The Gardener

Reputation: 119

Conditional group by join in R

I am new to R and rather flumoxed by the following problem. I have two vectors of dates (the vectors are not necessarily aligned, nor of the same length).

I want to find for each date in the first vector the next date in the second vector.

vecA <- as.Date(c('1951-07-01', '1953-01-01', '1957-04-01', '1958-12-01',
       '1963-06-01', '1965-05-01'))
vecB <- as.Date(c('1952-01-12', '1952-02-01', '1954-03-01', '1958-08-01', 
           '1959-03-01', '1964-03-01', '1966-05-01'))

In SQL I would write something like this, but I cannot find any tips in SO as to how to do this in R.

 select vecA.Date, min(vecB.Date)
       from vecA inner join vecB
       on vecA.Date < vecB.Date
       group by vecA.Date

The output should look like this:

Start      End
1951-07-01 1952-01-12
1953-01-01 1954-03-01
1957-04-01 1958-08-01
1958-12-01 1959-03-01
1963-06-01 1964-03-01
1965-05-01 1966-05-01

Upvotes: 2

Views: 88

Answers (2)

David Arenburg
David Arenburg

Reputation: 92282

Here's a possible solution using data.table rolling joins

library(data.table)
dt1 <- as.data.table(vecA) ## convert to `data.table` object
dt2 <- as.data.table(vecB) ## convert to `data.table` object
setkey(dt2) # key in order to perform a binary join
res <- dt2[dt1, vecB, roll = -Inf, by = .EACHI] # run the inner join while selecting closest date 
setnames(res, c("Start", "End"))
res
#         Start        End
# 1: 1951-07-01 1952-01-12
# 2: 1953-01-01 1954-03-01
# 3: 1957-04-01 1958-08-01
# 4: 1958-12-01 1959-03-01
# 5: 1963-06-01 1964-03-01
# 6: 1965-05-01 1966-05-01

Alternatively, we can also do:

data.table(vecA=vecB, vecB, key="vecA")[dt1, roll=-Inf]

Upvotes: 2

TARehman
TARehman

Reputation: 6749

This code will do what you are asking, but it's not clear what you are trying to accomplish and so this might not be the best way. In essence, this code first orders both vectors to ensure they are in the same ordering. Then, using a for loop, it loops over all the elements in vecA and uses x < vecB to find out which elements in vecB are less than x.

That is wrapped in which, which returns the numeric index of of each TRUE element of a vector, and then in min which gives the smallest numeric index. This is then used to subset vecB to return the date; it's all wrapped in print so you can see the output of the loop.

This is probably not the best way of doing this, but without more context on your goals it should at least get you started.

> vecA <- vecA[order(vecA)]
> vecB <- vecB[order(vecB)]
> for(x in vecA) {print(vecB[min(which(x < vecB))])}
[1] "1952-01-12"
[1] "1954-03-01"
[1] "1958-08-01"
[1] "1959-03-01"
[1] "1964-03-01"
[1] "1966-05-01"

Upvotes: 1

Related Questions