Reputation: 533
I hope somebody can help me with this. Below's a sample of the data I'm working on:
> print(sampleinput)
Time ACTIVITY_X ACTIVITY_Y ACTIVITY_Z
1: 6/19/18 10:40:00 60 74 95
2: 6/19/18 10:41:20 62 63 88
3: 6/19/18 10:42:40 60 56 82
4: 6/19/18 10:44:00 66 61 90
5: 6/19/18 10:45:20 60 53 80
6: 6/19/18 10:46:40 57 40 70
7: 6/19/18 10:48:00 54 41 68
8: 6/19/18 10:49:20 52 49 71
9: 6/19/18 10:50:40 61 49 78
10: 6/19/18 10:52:00 93 32 98
11: 6/19/18 10:53:20 80 54 97
12: 6/19/18 10:54:40 73 39 83
13: 6/19/18 10:56:00 47 37 60
14: 6/19/18 10:57:20 51 55 75
15: 6/19/18 10:58:40 51 60 79
16: 6/19/18 11:00:00 14 13 19
17: 6/19/18 11:01:20 0 0 0
18: 6/19/18 11:02:40 13 3 13
19: 6/19/18 11:04:00 20 10 22
20: 6/19/18 11:05:20 13 6 14
Below's a sample of how I would like my data to be transformed:
> print(sampleoutput)
Time ACTIVITY_X ACTIVITY_Y ACTIVITY_Z
1: 6/19/18 10:40:00 0 0 0
2: 6/19/18 10:44:00 188 180 260
3: 6/19/18 10:48:00 171 134 218
4: 6/19/18 10:52:00 206 130 247
5: 6/19/18 10:56:00 200 130 240
6: 6/19/18 11:00:00 116 128 173
7: 6/19/18 11:04:00 33 13 35
As you can see on column Time
, date and time measures have been taken every 80 seconds with associated values for columns ACTIVITY_X
, ACTIITY_Y
and ACTIVITY_Z
.
I would like to transform my dataset in a way so that instead of having date and time measures every 80 seconds, I will have them every 240 seconds (4 minutes).
As for the values in columns ACTIVITY_X
, ACTIITY_Y
and ACTIVITY_Z
, that would imply summing them every three rows to a single row. Note that, for each time, its associated ACTIVITY_X
, ACTIITY_Y
and ACTIVITY_Z
measures are calculated from an interval corresponding to: [previous time + 1 seconds to actual time].
Example: For the second line with time 6/19/18 10:41:20
values for ACTIVITY_X
, ACTIITY_Y
and ACTIVITY_Z
correspond to a sampling interval of [6/19/18 10:40:01 to 6/19/18 10:41:20].
The first row of the output table will have 0 for ACTIVITY_X
, ACTIITY_Y
and ACTIVITY_Z
values since I'm missing the values from the two previous time intervals, but that's no problem.
I'm quite new to R and although I've been able to transform my data with Excel I would like to automat this process which seems quite complex.
Hope this is an interesting question and any input is appreciated!
PS: I can't use dput since too big dataset, but I've uploaded it here
Upvotes: 0
Views: 539
Reputation: 385
lubridate
should have the functionality you need.
Looks like you are using data.table
already, so using that here...
X <- data.table(sampleinput)
X$Time <- as.POSIXct(X$Time, format = '%m/%d/%y %H:%M:%S')
X$tgroup <- lubridate::ceiling_date(X$Time, '4 mins')
X[, list( x = sum(ACTIVITY_X),
y = sum(ACTIVITY_Y),
z =sum(ACTIVITY_Z) ), by = list (tgroup)]
Returns
tgroup x y z
1: 2018-06-19 10:40:00 60 74 95
2: 2018-06-19 10:44:00 188 180 260
3: 2018-06-19 10:48:00 171 134 218
4: 2018-06-19 10:52:00 206 130 247
5: 2018-06-19 10:56:00 200 130 240
6: 2018-06-19 11:00:00 116 128 173
7: 2018-06-19 11:04:00 33 13 35
8: 2018-06-19 11:08:00 13 6 14
You could change the boundary case (first line) manually if necessary
Upvotes: 2