Reputation: 13332
I'm trying to store some intervals in a dataframe. A cut down version of the code that does this is here:
DateHired <- c("29/09/14", "07/04/08", "18/06/09", "09/03/15", "30/05/11", "05/11/07", "08/09/08", "30/09/13", "10/08/09", "13/08/14", "18/09/06", "21/01/08", "05/12/11", "28/06/10", "19/07/10", "05/05/14", "26/08/09", "21/04/08", "19/10/09")
TerminationDate <- c("11/06/10", "10/02/10", "06/10/09", "02/04/15", "30/06/11", "10/11/07", "17/04/14", "04/10/13", "08/02/12", "11/06/10", "03/07/09", "11/06/10", "08/08/13", "23/12/10", "20/12/13", "11/06/10", "11/06/10", "05/12/08", "01/03/10")
tenures = data.frame(DateHired, TerminationDate, stringsAsFactors=FALSE)
tenures$isoStart <- as.Date(tenures$DateHired, format="%d/%m/%Y")
tenures$isoFinish <- as.Date(tenures$TerminationDate, format="%d/%m/%Y")
tenures$periods = apply(tenures, 1, function(x) interval(x['isoStart'], x['isoFinish']) )
This ends up with this result:
> tenures$periods
[1] -135734400 58233600 9504000 2073600 2678400 432000 176860800 345600 78796800 -131673600 88041600 75340800
[13] 52876800 15379200 108000000 -123033600 24969600 19699200 11491200
When I do the same but manually. I.e.
> interval(as.Date("29/09/14", format="%d/%m/%Y"),as.Date("29/09/15", format="%d/%m/%Y") )
[1] 14-09-29 10:04:52 LMT--15-09-29 10:04:52 LMT
it gives a lubridate interval.
There are ways that I can probably solve this in other ways, but I was hoping to use the intervals in the next part of the puzzle!
Upvotes: 2
Views: 972
Reputation: 28441
tenures$isoStart <- as.Date(tenures$DateHired, format="%d/%m/%y")
tenures$isoFinish <- as.Date(tenures$TerminationDate, format="%d/%m/%y")
tenures$periods = interval(tenures$isoStart, tenures$isoFinish)
Your date format "%d/%m/%Y"
did not reflect the two-digit years in your data. The capital %Y
is for four-digit years.
Also, the interval
function is vectorized, meaning it will take the first element of each vector and create an interval, then move on to the second of each, and continue to the end.
head(tenures$periods)
#[1] 2014-09-28 20:00:00 EDT--2010-06-10 20:00:00 EDT 2008-04-06 20:00:00 EDT--2010-02-09 19:00:00 EST
#[3] 2009-06-17 20:00:00 EDT--2009-10-05 20:00:00 EDT 2015-03-08 20:00:00 EDT--2015-04-01 20:00:00 EDT
#[5] 2011-05-29 20:00:00 EDT--2011-06-29 20:00:00 EDT 2007-11-04 19:00:00 EST--2007-11-09 19:00:00 EST
Why didn't your first function work? Well it did work in a sense. The output is the span between the two dates, but the format/class was unexpected. Instead of the interval output, the number of seconds between the two dates were given.
For more on coercion and ?apply
:
If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
The function will work on data.frames, but with a warning that the results may not be what you expect after coercing to matrix. lapply
is friendlier towards data frames and in this case, the function is already vectorised.
Upvotes: 4