Reputation: 169
I have a set of events identified by an individual "Tag" their location "location" the Start of the event "StartDateTime_UTC" and the End of the event "EndDateTime_UTC"
First 50 rows: (I'll place the full dput down below)
Tag location StartDateTime_UTC EndDateTime_UTC
<chr> <fct> <dttm> <dttm>
1 5004.24 IN 2014-09-30 05:30:00 2014-10-17 14:12:19
2 5004.24 Unresolved 2014-10-17 14:12:19 2014-10-17 14:15:43
3 5004.24 OUT 2014-10-17 14:15:43 2014-11-04 10:05:31
4 5004.24 Unresolved 2014-11-04 10:05:31 2014-11-04 10:08:06
5 5004.24 IN 2014-11-04 10:08:06 2014-11-12 10:50:28
6 5004.24 Unresolved 2014-11-12 10:50:28 2014-11-12 10:51:33
7 5004.24 OUT 2014-11-12 10:51:33 2014-12-24 04:20:28
8 5004.24 Unresolved 2014-12-24 04:20:28 2014-12-24 04:31:27
9 5004.24 IN 2014-12-24 04:31:27 2015-02-08 22:30:27
10 5004.24 Unresolved 2015-02-08 22:30:27 2015-02-08 22:31:31
11 5004.24 OUT 2015-02-08 22:31:31 2015-03-01 11:18:32
12 5004.24 Unresolved 2015-03-01 11:18:32 2015-03-01 19:34:36
13 5004.24 IN 2015-03-01 19:34:36 2015-03-09 10:11:55
14 5004.24 Unresolved 2015-03-09 10:11:55 2015-03-09 10:48:02
15 5004.24 OUT 2015-03-09 10:48:02 2015-03-23 09:10:25
16 5004.24 Unresolved 2015-03-23 09:10:25 2015-03-23 09:18:07
17 5004.24 IN 2015-03-23 09:18:07 2015-04-09 23:19:26
18 5004.24 Unresolved 2015-04-09 23:19:26 2015-04-09 23:21:40
19 5004.24 OUT 2015-04-09 23:21:40 2015-04-21 20:11:59
20 5004.24 Unresolved 2015-04-21 20:11:59 2015-04-22 16:33:54
21 5004.24 IN 2015-04-22 16:33:54 2015-06-10 09:19:12
22 5010.04 IN 2014-05-09 18:26:00 2015-04-21 18:28:16
23 5011.03 IN 2016-06-17 15:29:00 2016-07-21 14:23:34
24 5011.07 IN 2014-06-05 17:58:00 2014-12-08 15:15:07
25 5017.06 IN 2013-09-20 03:00:00 2016-08-18 18:13:04
26 5025.22 IN 2013-06-11 14:20:00 2013-06-11 14:20:01
27 5025.26 IN 2016-12-29 22:38:00 2018-01-08 16:51:42
28 5032.24 IN 2014-07-18 18:04:00 2015-06-13 12:44:02
29 5038.04 IN 2014-12-16 17:51:07 2015-04-10 23:28:12
30 5039.03 IN 2016-04-01 17:26:00 2016-07-02 04:39:15
31 5039.07 IN 2014-04-08 19:28:04 2014-04-08 19:53:06
32 5045 IN 2013-08-07 17:00:00 2015-04-08 18:28:43
33 5053.26 IN 2016-11-14 18:12:00 2017-12-24 20:21:31
34 5066.04 IN 2014-02-25 21:32:00 2014-02-25 21:32:01
35 5067.07 IN 2014-10-02 16:39:00 2014-12-08 16:23:50
36 5073.06 IN 2013-05-22 14:45:00 2013-05-22 14:45:01
37 5074.16 IN 2016-02-11 19:02:00 2016-03-09 18:21:45
38 5074.16 Unresolved 2016-03-09 18:21:45 2016-03-09 19:50:22
39 5074.16 OUT 2016-03-09 19:50:22 2016-03-12 23:26:45
40 5088.11 IN 2015-06-02 16:29:00 2015-08-08 11:07:46
41 5094.04 IN 2014-10-17 16:04:00 2014-10-31 16:39:28
42 5101.06 IN 2013-09-20 03:00:00 2013-09-20 03:00:01
43 5101.06 Unresolved 2013-09-20 03:00:01 2014-02-03 07:15:08
44 5101.06 OUT 2014-02-03 07:15:08 2014-04-17 12:15:34
45 5116.24 IN 2014-03-07 20:59:00 2015-03-07 22:21:52
46 5123.03 IN 2016-01-14 18:55:00 2016-03-09 19:21:20
47 5123.03 Unresolved 2016-03-09 19:21:20 2016-03-09 20:24:36
48 5123.03 OUT 2016-03-09 20:24:36 2016-03-09 21:51:51
49 5123.07 IN 2014-01-21 18:54:00 2014-05-20 15:23:09
50 5150.04 IN 2014-05-22 16:17:00 2016-03-07 14:10:00
What I'd like to do is divide the total time for each event into how much time each individual spent at each location (IN, OUT, or Unresolved), for each month in each year.
For example the first 3 rows above would become:
Tag location Month Year Duration_hrs
5004.24 IN 9 2014 18.50
5004.24 IN 10 2014 328.20
5004.24 Unresolved 10 2014 .06
5004.24 OUT 10 2014 345.74
5004.24 OUT 11 2014 82.09
I'm not familiar with a method for this yet. Is there a way to do this in dplyr or another package which would be useful?
dput:
structure(list(Tag = c("5004.24", "5004.24", "5004.24", "5004.24",
"5004.24", "5004.24", "5004.24", "5004.24", "5004.24", "5004.24",
"5004.24", "5004.24", "5004.24", "5004.24", "5004.24", "5004.24",
"5004.24", "5004.24", "5004.24", "5004.24", "5004.24", "5010.04",
"5011.03", "5011.07", "5017.06", "5025.22", "5025.26", "5032.24",
"5038.04", "5039.03", "5039.07", "5045", "5053.26", "5066.04",
"5067.07", "5073.06", "5074.16", "5074.16", "5074.16", "5088.11",
"5094.04", "5101.06", "5101.06", "5101.06", "5116.24", "5123.03",
"5123.03", "5123.03", "5123.07", "5150.04", "5157.06", "5158.16",
"5165.22", "5172.24", "5178.31", "5179.03", "5179.07", "5186.16",
"5186.16", "5186.16", "5186.16", "5186.16", "5186.16", "5186.16",
"5186.16", "5186.16", "5186.16", "5186.16", "5186.16", "5186.16",
"5186.16", "5186.16", "5200.11", "5206.31", "5214.16", "5228.24",
"5235", "5242.16", "5242.16", "5242.16", "5242.16", "5242.16",
"5249", "5249", "5249", "5256.24", "5256.24", "5256.24", "5256.24",
"5256.24", "5256.24", "5256.24", "5263.03", "5263.07", "5270.16",
"5284.11", "5290.31", "5290.31", "5290.31", "5290.31", "5298.16",
"5298.16", "5298.16", "5298.16", "5298.16", "5312.11", "5318.04",
"5325.06", "5326.16", "5326.16", "5326.16", "5326.16", "5326.16",
"5326.16", "5340.11", "5346.31", "5347.07", "5347.07", "5347.07",
"5347.07", "5347.07", "5347.07", "5347.07", "5347.07", "5347.07",
"5347.07", "5347.07", "5347.07", "5347.07", "5361.26", "5368.24",
"5374.04", "5374.04", "5374.04", "5374.04", "5374.04", "5375.03",
"5375.07", "5381.06", "5402.31", "5402.31", "5402.31", "5403.07",
"5403.07", "5403.07", "5403.07", "5403.07", "5403.07", "5403.07",
"5431.07", "5438.16", "5445.26", "5445.26", "5445.26", "5452.11",
"5465.06", "5465.06", "5465.06", "5480.24", "5487.03", "5487.03",
"5487.03", "5487.07", "5487.07", "5487.07", "5487.07", "5487.07",
"5487.07", "5487.07", "5487.07", "5487.07", "5487.07", "5487.07",
"5493.06", "5493.06", "5493.06", "5493.06", "5493.06", "5493.06",
"5493.06", "5493.06", "5493.06", "5501.22", "5508.11", "5514.31",
"5536.11", "5542.31", "5542.31", "5542.31", "5550.16", "5557.22",
"5564.24", "5570.04", "5571.03", "5571.07", "5577", "5585.26",
"5592.11", "5599.15", "5599.15"), location = structure(c(1L,
3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L,
3L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 3L, 2L, 1L, 1L, 1L, 3L, 2L, 1L, 1L, 3L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 1L,
3L, 2L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 3L,
1L, 1L, 3L, 2L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 3L, 1L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 3L, 1L,
2L, 1L, 1L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L,
1L, 1L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 1L, 3L, 2L,
3L, 1L, 3L, 2L, 1L, 1L, 1L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 3L,
2L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 1L, 3L, 2L, 3L,
1L, 3L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c("IN", "OUT", "Unresolved"
), class = "factor"), StartDateTime_UTC = structure(c(1412055000,
1413555139, 1413555343, 1415095531, 1415095686, 1415789428, 1415789493,
1419394828, 1419395487, 1423434627, 1423434691, 1425208712, 1425238476,
1425895915, 1425898082, 1427101825, 1427102287, 1428621566, 1428621700,
1429647119, 1429720434, 1399659960, 1466177340, 1401991080, 1379646000,
1370960400, 1483051080, 1405706640, 1418752267, 1459531560, 1396985284,
1375894800, 1479147120, 1393363920, 1412267940, 1369233900, 1455217320,
1457547705, 1457553022, 1433262540, 1413561840, 1379646000, 1379646001,
1391411708, 1394225940, 1452797700, 1457551280, 1457555076, 1390330440,
1400775420, 1373994000, 1477593060, 1374166980, 1401299940, 1447954320,
1456854360, 1393888860, 1472234940, 1489406928, 1489519785, 1493754358,
1493768930, 1493920595, 1495185477, 1495192085, 1496331260, 1496332668,
1502330689, 1504055913, 1504056565, 1504471969, 1504484502, 1446829680,
1450458660, 1478808060, 1391558460, 1375894800, 1483040340, 1489421085,
1489524492, 1500821550, 1500840149, 1379430000, 1397144701, 1397145001,
1418064072, 1425499236, 1425499420, 1426550367, 1426705419, 1456257069,
1456260660, 1471541820, 1407518280, 1477072440, 1443805740, 1425491220,
1426570277, 1445317719, 1445333300, 1452196320, 1464295591, 1464296126,
1465051334, 1465057497, 1438967700, 1398371160, 1380208260, 1472238360,
1473868480, 1473870631, 1500912421, 1500916070, 1501200103, 1450372080,
1429632420, 1417450780, 1425319859, 1425319875, 1425476848, 1425477093,
1425490129, 1425490424, 1425497300, 1425497428, 1425497915, 1425499161,
1426058600, 1426059054, 1458149820, 1416942495, 1399653120, 1411058361,
1411058655, 1425581564, 1425582554, 1463157660, 1399655460, 1373994000,
1423859700, 1426434141, 1426438127, 1418054706, 1423434226, 1423434240,
1424431972, 1424432003, 1425225584, 1425226167, 1413566460, 1463156700,
1457457262, 1464341296, 1464345439, 1438964700, 1385483040, 1392229993,
1392230720, 1397856480, 1454698803, 1457554029, 1457557132, 1392415980,
1394995835, 1394996053, 1394999500, 1394999554, 1398891904, 1398892060,
1400546392, 1410656812, 1410657169, 1412681302, 1383667200, 1394063203,
1394064827, 1394534414, 1394546963, 1394903943, 1394904559, 1404290854,
1404295213, 1374166980, 1450460220, 1444934640, 1443814080, 1423862520,
1425398953, 1425403026, 1482517054, 1382107500, 1406303160, 1401292020,
1459530420, 1397852580, 1375806600, 1459355400, 1436547420, 1434128700,
1455181152), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
EndDateTime_UTC = structure(c(1413555139, 1413555343, 1415095531,
1415095686, 1415789428, 1415789493, 1419394828, 1419395487,
1423434627, 1423434691, 1425208712, 1425238476, 1425895915,
1425898082, 1427101825, 1427102287, 1428621566, 1428621700,
1429647119, 1429720434, 1433927952, 1429640896, 1469111014,
1418051707, 1471543984, 1370960401, 1515430302, 1434199442,
1428708492, 1467434355, 1396986786, 1428517723, 1514146891,
1393363921, 1418055830, 1369233901, 1457547705, 1457553022,
1457825205, 1439032066, 1414773568, 1379646001, 1391411708,
1397736934, 1425766912, 1457551280, 1457555076, 1457560311,
1400599389, 1457359800, 1499495745, 1477593061, 1374166981,
1401396204, 1447954321, 1462853353, 1402068722, 1489406928,
1489519785, 1493754358, 1493768930, 1493915356, 1495185477,
1495192085, 1496331260, 1496332668, 1502328647, 1504055913,
1504056565, 1504471969, 1504484502, 1510842039, 1449871257,
1467438121, 1489417066, 1399047138, 1402529845, 1489421085,
1489524492, 1500821550, 1500840149, 1501331232, 1397144701,
1397145001, 1397146277, 1425499236, 1425499420, 1426550367,
1426550808, 1456257069, 1456260660, 1457306762, 1471541821,
1407522783, 1489406240, 1447974368, 1426563298, 1445317719,
1445333300, 1460371267, 1464295591, 1464296126, 1465051334,
1465057497, 1475823163, 1438969110, 1429569640, 1380208261,
1473868480, 1473870631, 1500912421, 1500916070, 1501192164,
1522212002, 1455503331, 1429632421, 1425319859, 1425319875,
1425476848, 1425477093, 1425490129, 1425490424, 1425497300,
1425497428, 1425497915, 1425499161, 1426058600, 1426059054,
1435973349, 1518286934, 1424550436, 1411058361, 1411058655,
1425581564, 1425582554, 1448200301, 1467374409, 1411930669,
1485696012, 1426434141, 1426438127, 1426439051, 1423434226,
1423434240, 1424431972, 1424432003, 1425225584, 1425226167,
1426387273, 1414861317, 1465669005, 1464341296, 1464345439,
1464345554, 1438964701, 1392229993, 1392230720, 1402254843,
1403572223, 1457554029, 1457557132, 1457575616, 1394995835,
1394996053, 1394999500, 1394999554, 1398891904, 1398892060,
1400546392, 1410656812, 1410657169, 1412681302, 1412892331,
1394063203, 1394064827, 1394534414, 1394546963, 1394903943,
1394904559, 1404290854, 1404295213, 1414516048, 1374166981,
1450460221, 1445287124, 1451876530, 1425398953, 1425403026,
1433699058, 1503067936, 1394903394, 1406514937, 1411706590,
1460054294, 1413725736, 1375806601, 1485062993, 1448650763,
1455181152, 1455181271), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -200L))
Upvotes: 1
Views: 524
Reputation: 11908
In these situations, I like to write a little helper function that takes a start and an end time, and splits that into a number of smaller periods at a given unit size:
library(tidyverse)
library(lubridate)
split_interval <- function(start, end, unit) {
breaks <- seq(floor_date(start, unit), ceiling_date(end, unit), by = unit)
timeline <- c(start, breaks[breaks > start & breaks < end], end)
tibble(.start = head(timeline, -1), .end = tail(timeline, -1))
}
split_interval(now(), now() + hours(2), unit = "hours")
#> # A tibble: 3 x 2
#> .start .end
#> <dttm> <dttm>
#> 1 2019-11-08 23:33:18 2019-11-09 00:00:00
#> 2 2019-11-09 00:00:00 2019-11-09 01:00:00
#> 3 2019-11-09 01:00:00 2019-11-09 01:33:18
With this in hand, we can split the intervals on each row, giving a list column containing data frames of periods that each row falls into:
tbl_split <- tbl %>%
mutate(periods = map2(StartDateTime_UTC, EndDateTime_UTC,
split_interval, unit = "months"))
tbl_split
#> # A tibble: 3 x 5
#> Tag location StartDateTime_UTC EndDateTime_UTC periods
#> <chr> <fct> <dttm> <dttm> <list>
#> 1 5004.24 IN 2014-09-30 05:30:00 2014-10-17 14:12:19 <tibble [2 x ~
#> 2 5004.24 Unresolved 2014-10-17 14:12:19 2014-10-17 14:15:43 <tibble [1 x ~
#> 3 5004.24 OUT 2014-10-17 14:15:43 2014-11-04 10:05:31 <tibble [2 x ~
(I'm only using the first three rows in tbl
to keep the output small. Code at the end.)
Then, we can unnest()
to create a longer data frame with each original row now split into multiple rows of shorter periods:
tbl_split_long <- tbl_split %>% unnest(periods)
tbl_split_long %>%
select(location, .start, .end)
#> # A tibble: 5 x 3
#> location .start .end
#> <fct> <dttm> <dttm>
#> 1 IN 2014-09-30 05:30:00 2014-10-01 00:00:00
#> 2 IN 2014-10-01 00:00:00 2014-10-17 14:12:19
#> 3 Unresolved 2014-10-17 14:12:19 2014-10-17 14:15:43
#> 4 OUT 2014-10-17 14:15:43 2014-11-01 00:00:00
#> 5 OUT 2014-11-01 00:00:00 2014-11-04 10:05:31
Now we can use the start of each period to determine the year and month, and then it’s just a matter of a groupped summary to get the end result:
tbl_split_long %>%
group_by(Tag, location, year = year(.start), month = month(.start)) %>%
summarize(duration = sum(difftime(.end, .start, units = "hours")))
#> # A tibble: 5 x 5
#> # Groups: Tag, location, year [3]
#> Tag location year month duration
#> <chr> <fct> <dbl> <dbl> <drtn>
#> 1 5004.24 IN 2014 9 18.50000000 hours
#> 2 5004.24 IN 2014 10 398.20527778 hours
#> 3 5004.24 OUT 2014 10 345.73805556 hours
#> 4 5004.24 OUT 2014 11 82.09194444 hours
#> 5 5004.24 Unresolved 2014 10 0.05666667 hours
Data:
tbl <- structure(
list(
Tag = c("5004.24", "5004.24", "5004.24"),
location = structure(
c(1L, 3L, 2L),
.Label = c("IN", "OUT", "Unresolved"),
class = "factor"
),
StartDateTime_UTC = structure(
c(1412055000, 1413555139, 1413555343),
class = c("POSIXct", "POSIXt"),
tzone = "UTC"
),
EndDateTime_UTC = structure(
c(1413555139, 1413555343, 1415095531),
class = c("POSIXct", "POSIXt"),
tzone = "UTC"
)
),
row.names = c(NA, 3L),
class = c("tbl_df", "tbl", "data.frame")
)
Upvotes: 2