Reputation: 539
My one table has data with date mentioned in last two columns:
dat<- data.frame(a = c(rep("x",3)),
date1=c(seq(as.Date("2018-01-01"), as.Date("2018-01-3"), 1)),
date2=c(seq(as.Date("2018-01-08"), as.Date("2018-01-10"), 1)))
a date1 date2
1 x 2018-01-01 2018-01-08
2 x 2018-01-02 2018-01-09
3 x 2018-01-03 2018-01-10
My another table has what kind of day each day is
cal <- data.frame(dt = c(seq(as.Date("2018-01-01"), as.Date("2018-01-10"),1)),
day = c(rep("workday",5), rep("holiday",1), rep("weekend",4)))
How to get number of days in table 1(dat) as anew column such that the it counts only the workday that falls in the range mentioned in column 2 and column 3?
Example output with 4 columns. The last column is the number of workdays for the date range in previous two columns
a date1 date2 countdown
1 x 2018-01-01 2018-01-08 5
2 x 2018-01-02 2018-01-09 4
3 x 2018-01-03 2018-01-10 3
Upvotes: 0
Views: 97
Reputation: 27732
data.table
solution
library( data.table )
#set data to data.table format
setDT(dat); setDT(cal)
setkey(dat, date1, date2 )
dat[dat,
N := { val = cal[ day == "workday" & dt >= i.date1 & dt <= i.date2 ]
list( nrow( val ) ) },
by = .EACHI ]
# a date1 date2 N
# 1: x 2018-01-01 2018-01-08 5
# 2: x 2018-01-02 2018-01-09 4
# 3: x 2018-01-03 2018-01-10 3
update data.table::foverlaps() solution
library( data.table )
#set data to data.table format
setDT(dat); setDT(cal)
#create dummy date
cal[,dt2 := dt]
#set keys
setkey( dat, date1, date2 )
setkey( cal, dt, dt2 )
#overlap join
ans <- foverlaps( dat, cal )
#summarise
ans[, .( countdown = uniqueN( dt[day == "workday"] ) ), by = .(a, date1, date2)][]
# a date1 date2 countdown
# 1: x 2018-01-01 2018-01-08 5
# 2: x 2018-01-02 2018-01-09 4
# 3: x 2018-01-03 2018-01-10 3
Upvotes: 2
Reputation: 8880
additional solution
# v1
df %>%
rowwise() %>%
mutate(int_date = list(seq(date1, date2, "1 day"))) %>%
unnest(int_date) %>%
left_join(cal, by = c("int_date" = "dt")) %>%
filter(day == "workday") %>%
group_by(a, date1, date2) %>%
count
# v2
df %>%
rowwise() %>%
mutate(int_date = list(seq(date1, date2, "1 day")),
out = sum(unlist(int_date) %in% cal$dt[cal$day == "workday"])) %>%
select(-int_date)
# v3 (using @Ronak Shah hint with a `map` )
df %>%
mutate(int_date = map2(date1, date2, seq, "1 day"),
out = map_dbl(int_date, ~ sum(.x %in% cal$dt[cal$day == "workday"]))) %>%
select(-int_date)
# A tibble: 3 x 4
# Rowwise:
a date1 date2 out
<chr> <date> <date> <int>
1 x 2018-01-01 2018-01-08 5
2 x 2018-01-02 2018-01-09 4
3 x 2018-01-03 2018-01-10 3
Upvotes: 0
Reputation: 101064
A base R option
within(
dat,
countdown <- sapply(
1:nrow(dat),
function(k) sum(cal$day == "workday" & !is.na(cut(cal$dt, c(date1[k], date2[k]))))
)
)
giving
a date1 date2 countdown
1 x 2018-01-01 2018-01-08 5
2 x 2018-01-02 2018-01-09 4
3 x 2018-01-03 2018-01-10 3
Upvotes: 1
Reputation: 388817
A way using tidyverse
functions :
Create a sequence of days between date1
and date2
Get the data in long format
Left join data the above data with cal
dataframe
Calculate number of workdays for each row.
library(dplyr)
dat %>%
mutate(row = row_number(),
dt = purrr::map2(date1, date2, seq, by = '1 day')) %>%
tidyr::unnest(dt) %>%
left_join(cal, by = 'dt') %>%
group_by(row, a, date1, date2) %>%
summarise(countdown = sum(day == 'workday')) %>%
ungroup() %>%
select(-row)
# a date1 date2 countdown
# <chr> <date> <date> <int>
#1 x 2018-01-01 2018-01-08 5
#2 x 2018-01-02 2018-01-09 4
#3 x 2018-01-03 2018-01-10 3
Upvotes: 1