user6108954
user6108954

Reputation: 21

Optimize calculation of number of weekdays

I want to calculate the number weekdays between two dates in R and I am using the following code:

Nweekdays <- Vectorize(function(a, b) + sum(!weekdays(seq(a, b, "days")) %in% c("Saturday", "Sunday")))

temp$diff <- Nweekdays(temp$from,temp$to)

This code works absolutely fine with a small data (.1 million) but the code runs for hours on large data (5 million) and still doesn't get completed.

Please suggest way to do this calculation faster.

Upvotes: 2

Views: 87

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269654

Here are some alternatives:

1) break into whole weeks and fraction of a week If the reason that it is taking so long is due to long sequences then this will ensure that the sequence is never more than a week. Here from and to are the from and to dates:

weeks <- as.numeric(to - from) %/% 7
5*weeks + Nweekdays(7*weeks+from, to)

For example, with these from and to values it gives identical results to Nweekdays:

from <- as.Date("2016-03-27") 
to <- as.Date("2016-04-03")
weeks <- as.numeric(to - from) %/% 7
5*weeks + Nweekdays(7*weeks+from, to)
## [1] 5

Nweekdays(from, to)
## [1] 5

2) precompute fraction of a week part If from and to are less than a week apart then we can precompute all 49 possibilities in a 7 by 7 matrix m whose rows and columns are the day of the week of from and to (the first row is Sun, next is Mon, etc. and similarly for columns) and then define Nweekdays2 which computes 5 times the number of whole weeks plus the lookup value in m for the partial week.

# precompute m
sun <- as.Date("2012-01-01") # any Sunday will do
m <- outer(0:6, 0:6, function(x, y) Nweekdays(sun + x, sun + y + 7*(y < x)))

Nweekdays2 <- function(from, to) {
  weeks <- as.integer(to - from) %/% 7L
  5L * weeks + m[cbind(as.POSIXlt(from)$wday + 1L, as.POSIXlt(to)$wday + 1L)]
}

# test

set.seed(123)
from <- as.Date("2000-01-01") + 0:99
to <- from + sample(100, 100)

identical(Nweekdays2(from, to), Nweekdays(from, to))
## [1] TRUE

Note that as an alternative to the definition of m above we notice that by inspecting m that it could be directly constructed like this:

Rm <- row(diag(7)); Cm <- col(diag(7))
m <- (1 + 5 * (Cm < Rm)) * (Rm > 1) * (Cm < 7) - (Rm == 1 & Cm == 7) + Cm - Rm

Upvotes: 2

Related Questions