Reputation: 113
I tried to count days between specific dates. I have 2 columns with all character vectors.
start.date <- c("2015-01-10","2015-01-11","2015-02-24")
end.date <- c("2015-03-10","2015-04-01","2015-06-13")
date10 <- data.frame(cbind(start.date,end.date))
date10$start.date <- as.character(date10$start.date)
date10$end.date <- as.character(date10$end.date)
str(date10)
and the specific dates are from 2015-04-11 to 2015-07-10.
so I made all date which is between the specific dates by using seq().
sp.da1<-ymd("2015-04-11")
sp.da2<-ymd("2015-07-10")
inteval.da<-seq(sp.da1, sp.da2, by = 'day')
I wanted to know how many days are between the specific dates.
I tried to use seq(start.date,end.date,by = 'day') like above, but I get this error: 'from' must be of length 1
Please Help me!!!
Upvotes: 3
Views: 20118
Reputation: 7174
You are asking how many days of a given time interval is inside a main time interval.
Let's first set up the three varying time intervals. Then we write a function that checks for every day x
whether it is inside or outside the main interval. If we sum up the number of days inside the main interval, we'll have what you're searching for:
date10$start.date <- as.Date.character(date10$start.date, format="%Y-%m-%d")
date10$end.date <- as.Date.character(date10$end.date, format="%Y-%m-%d")
your_intervals <- Map(seq, from = date10[, 1], to = date10[, 2], by = "days")
your_intervals
is a list with three data frames, each containing every day in the interval.
is_in_interval <- function(x, l_bound = sp.da1, u_bound = sp.da2){
return (x > l_bound) & (x < u_bound)
}
sapply(your_intervals, function(x) sum(is_in_interval(x)))
# [1] 0 0 63
Upvotes: 3
Reputation: 2222
First off: why aren't the columns start.date and end.date already of a Date class? If you would store these columns as dates, you wouldn't have to transform them when you want to use them as such. In your example code, you're passing character strings to the seq()
function, which doesn't work very well.
The following code should give you a sequence of dates for every row in your dataframe:
apply(date10, 1, function(x) seq(ymd(x['start.date']), ymd(x['end.date']), 'day'))
What this code does is slice your dataframe up in rows and consider them one by one. This means you are passing one start date and one end date to the seq
function each time, instead of a whole column of each. That was what caused your error.
If you want to know the number of days in between, I would suggest this solution:
date10$diff <- ymd(date10$end.date) - ymd(date10$start.date)
This creates a column of the 'difftime' class. You can convert it to a simple integer by adding as.integer()
:
date10$diff <- as.integer(ymd(date10$end.date) - ymd(date10$start.date))
Upvotes: 0