Reputation: 119
I am new to R and rather flumoxed by the following problem. I have two vectors of dates (the vectors are not necessarily aligned, nor of the same length).
I want to find for each date in the first vector the next date in the second vector.
vecA <- as.Date(c('1951-07-01', '1953-01-01', '1957-04-01', '1958-12-01',
'1963-06-01', '1965-05-01'))
vecB <- as.Date(c('1952-01-12', '1952-02-01', '1954-03-01', '1958-08-01',
'1959-03-01', '1964-03-01', '1966-05-01'))
In SQL I would write something like this, but I cannot find any tips in SO as to how to do this in R.
select vecA.Date, min(vecB.Date)
from vecA inner join vecB
on vecA.Date < vecB.Date
group by vecA.Date
The output should look like this:
Start End
1951-07-01 1952-01-12
1953-01-01 1954-03-01
1957-04-01 1958-08-01
1958-12-01 1959-03-01
1963-06-01 1964-03-01
1965-05-01 1966-05-01
Upvotes: 2
Views: 88
Reputation: 92282
Here's a possible solution using data.table
rolling joins
library(data.table)
dt1 <- as.data.table(vecA) ## convert to `data.table` object
dt2 <- as.data.table(vecB) ## convert to `data.table` object
setkey(dt2) # key in order to perform a binary join
res <- dt2[dt1, vecB, roll = -Inf, by = .EACHI] # run the inner join while selecting closest date
setnames(res, c("Start", "End"))
res
# Start End
# 1: 1951-07-01 1952-01-12
# 2: 1953-01-01 1954-03-01
# 3: 1957-04-01 1958-08-01
# 4: 1958-12-01 1959-03-01
# 5: 1963-06-01 1964-03-01
# 6: 1965-05-01 1966-05-01
Alternatively, we can also do:
data.table(vecA=vecB, vecB, key="vecA")[dt1, roll=-Inf]
Upvotes: 2
Reputation: 6749
This code will do what you are asking, but it's not clear what you are trying to accomplish and so this might not be the best way. In essence, this code first orders both vectors to ensure they are in the same ordering. Then, using a for
loop, it loops over all the elements in vecA
and uses x < vecB
to find out which elements in vecB
are less than x
.
That is wrapped in which
, which returns the numeric index of of each TRUE
element of a vector, and then in min
which gives the smallest numeric index. This is then used to subset vecB
to return the date; it's all wrapped in print
so you can see the output of the loop.
This is probably not the best way of doing this, but without more context on your goals it should at least get you started.
> vecA <- vecA[order(vecA)]
> vecB <- vecB[order(vecB)]
> for(x in vecA) {print(vecB[min(which(x < vecB))])}
[1] "1952-01-12"
[1] "1954-03-01"
[1] "1958-08-01"
[1] "1959-03-01"
[1] "1964-03-01"
[1] "1966-05-01"
Upvotes: 1