Reputation: 1852
I have the following column in my data frame:
DateTime
1 2011-10-03 08:00:04
2 2011-10-03 08:00:05
3 2011-10-03 08:00:06
4 2011-10-03 08:00:09
5 2011-10-03 08:00:15
6 2011-10-03 08:00:24
7 2011-10-03 08:00:30
8 2011-10-03 08:00:42
9 2011-10-03 08:01:01
10 2011-10-03 08:01:24
11 2011-10-03 08:01:58
12 2011-10-03 08:02:34
13 2011-10-03 08:03:25
14 2011-10-03 08:04:26
15 2011-10-03 08:06:00
With dput
:
> dput(smallDF)
structure(list(DateTime = structure(c(1317621604, 1317621605,
1317621606, 1317621609, 1317621615, 1317621624, 1317621630, 1317621642,
1317621661, 1317621684, 1317621718, 1317621754, 1317621805, 1317621866,
1317621960, 1317622103, 1317622197, 1317622356, 1317622387, 1317622463,
1317622681, 1317622851, 1317623061, 1317623285, 1317623404, 1317623498,
1317623612, 1317623849, 1317623916, 1317623994, 1317624174, 1317624414,
1317624484, 1317624607, 1317624848, 1317625023, 1317625103, 1317625179,
1317625200, 1317625209, 1317625229, 1317625238, 1317625249, 1317625264,
1317625282, 1317625300, 1317625315, 1317625339, 1317625353, 1317625365,
1317625371, 1317625381, 1317625395, 1317625415, 1317625423, 1317625438,
1317625458, 1317625469, 1317625487, 1317625500, 1317625513, 1317625533,
1317625548, 1317625565, 1317625581, 1317625598, 1317625613, 1317625640,
1317625661, 1317625674, 1317625702, 1317625715, 1317625737, 1317625758,
1317625784, 1317625811, 1317625826, 1317625841, 1317625862, 1317625895,
1317625909, 1317625935, 1317625956, 1317625973, 1317626001, 1317626043,
1317626062, 1317626100, 1317626113, 1317626132, 1317626153, 1317626179,
1317626212, 1317626239, 1317626271, 1317626296, 1317626323, 1317626361,
1317626384, 1317626407), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = "DateTime", row.names = c(NA,
-100L), class = "data.frame")
My goal: I want to calculate the time difference, in seconds, between each measurement.
Edit: I'm looking to get the following result, where the time difference (in seconds) between each data point is calculated, except for the first value of the day (line 3), when the time is calculate relative to 8 am:
DateTime Seconds
1 2011-09-30 21:59:02 6
2 2011-09-30 21:59:04 2
3 2011-10-03 08:00:04 4
4 2011-10-03 08:00:05 1
5 2011-10-03 08:00:06 1
6 2011-10-03 08:00:09 3
7 2011-10-03 08:00:15 5
8 2011-10-03 08:00:24 9
9 2011-10-03 08:00:30 6
10 2011-10-03 08:00:42 12
11 2011-10-03 08:01:01 19
12 2011-10-03 08:01:24 23
13 2011-10-03 08:01:58 34
14 2011-10-03 08:02:34 36
15 2011-10-03 08:03:25 51
16 2011-10-03 08:04:26 61
17 2011-10-03 08:06:00 94
However, the measurements start at 8:00 am, so if the value is the first of the day, the number of seconds relative to 8:00 am need to be calculated. In the example above, the first measurement ends at 8:00:04 so using the $sec
attribute of POSIX
could work here, but on other days the first value may happen a few minutes after 8:00 o'clock.
I've tried to achieve that goal with the following function:
SecondsInBar <- function(x, startTime){
# First data point or first of day
if (x == 1 || x > 1 && x$wkday != x[-1]$wkday){
seconds <- as.numeric(difftime(x,
as.POSIXlt(startTime, format = "%H:%M:%S"),
units = "secs"))
# else calculate time difference
} else {
seconds <- as.numeric(difftime(x, x[-1], units = "secs"))
}
return (seconds)
}
Which then could be called with SecondsInBar(smallDF$DateTime, "08:00:00")
.
There are at least two problems with this function, but I don't know how to solve these:
x$wkday != x[-1]$wkday
returns a $ operator is
invalid for atomic vectors
error, as.POSIXlt(startTime, format = "%H:%M:%S")
uses the
current date, which makes the difftime
calculation erroneous.My question: Where am I going wrong with this function? And: is this approach a viable way or should I approach it from a different angle?
Upvotes: 0
Views: 647
Reputation: 66844
How about something along these lines:
smallDF$DateTime - as.POSIXct(paste(strftime(smallDF$DateTime,"%Y-%m-%d"),"07:00:00"))
Time differences in secs
[1] 4 5 6 9 15 24 30 42 61 84 118 154 205 266 360
[16] 503 597 756 787 863 1081 1251 1461 1685 1804 1898 2012 2249 2316 2394
[31] 2574 2814 2884 3007 3248 3423 3503 3579 3600 3609 3629 3638 3649 3664 3682
[46] 3700 3715 3739 3753 3765 3771 3781 3795 3815 3823 3838 3858 3869 3887 3900
[61] 3913 3933 3948 3965 3981 3998 4013 4040 4061 4074 4102 4115 4137 4158 4184
[76] 4211 4226 4241 4262 4295 4309 4335 4356 4373 4401 4443 4462 4500 4513 4532
[91] 4553 4579 4612 4639 4671 4696 4723 4761 4784 4807
attr(,"tzone")
[1] ""
Note that I used 7am as when I copied your data my it decided to interpret it as BST.
As for your errors, you can't use $
to get elements of a date with POSIXct
(which is how smallDF$DateTime
is defined), only with POSIXlt
. And for the second error, if you don't supply a date, it has to assume the current date, as there is no other information to draw upon.
Edit
Now its been clarified, I would propose a different approach: split
your data frame by day, and then c
ombine the times with the reference time and do diff
on that, using lapply
to loop over days:
#modify dataframe to add extra day to second half
smallDF[51:100,1] <- smallDF[51:100,1]+86400
smallDF2 <- split(smallDF,strftime(smallDF$DateTime,"%Y-%m-%d"))
lapply(smallDF2,function(x) diff(c(as.POSIXct(paste(strftime(x$DateTime[1],"%Y-%m-%d"),"07:00:00")),x$DateTime)))
$`2011-10-03`
Time differences in secs
[1] 4 1 1 3 6 9 6 12 19 23 34 36 51 61 94 143 94 159 31
[20] 76 218 170 210 224 119 94 114 237 67 78 180 240 70 123 241 175 80 76
[39] 21 9 20 9 11 15 18 18 15 24 14 12
$`2011-10-04`
Time differences in secs
[1] 3771 10 14 20 8 15 20 11 18 13 13 20 15 17 16
[16] 17 15 27 21 13 28 13 22 21 26 27 15 15 21 33
[31] 14 26 21 17 28 42 19 38 13 19 21 26 33 27 32
[46] 25 27 38 23 23
Upvotes: 1