Reputation: 13
I am trying to figure out how to assign group id based on time intervals in R.
More context: I have merged GPS data (lat/lon data points, recorded in irregular intervals) with acceleration data (ACC "bursts" of 82 data points, recorded at the start of every minute - all 82 data points in one burst have the same timestamp).
As GPS points and ACC bursts were collected simultaneously, I now want to group GPS points with the associated ACC bursts: assign all GPS and ACC data that ocurr within the same minute, a unique group id.
EDIT: Here are some sample data. I want to group the GPS point in row 8 to the ACC data within the same minute (in this case above the GPS point).
structure(list(X.1 = 1:11, timestamp = c("2019-01-26T16:25:00Z", "2019-01-26T16:25:00Z", "2019-01-26T16:25:00Z", "2019-01-26T16:25:00Z", "2019-01-26T16:25:00Z", "2019-01-26T16:25:00Z", "2019-01-26T16:25:00Z", "2019-01-26T16:25:47Z", "2019-01-26T16:26:00Z", "2019-01-26T16:26:00Z", "2019-01-26T16:26:00Z"), sensor.type = c("acceleration", "acceleration", "acceleration", "acceleration", "acceleration", "acceleration", "acceleration", "gps", "acceleration", "acceleration", "acceleration"), location.long = c(NA, NA, NA, NA, NA, NA, NA, 44.4777343, NA, NA, NA), location.lat = c(NA, NA, NA, NA, NA, NA, NA, -12.2839707, NA, NA, NA), annotation = c("Moving/Climbing", "Moving/Climbing", "Moving/Climbing", "Moving/Climbing", "Moving/Climbing", "Moving/Climbing", "Moving/Climbing", "Moving/Climbing", "Moving/Climbing", "Moving/Climbing", "Moving/Climbing"), X = c(2219L, 1694L, 1976L, 1744L, 2014L, 2202L, 2269L, NA, 1874L, 2024L, 1990L), Y = c(1416L, 1581L, 1524L, 1620L, 1409L, 1545L, 1771L, NA, 1687L, 1773L, 1813L), Z = c(2189L, 2209L, 2121L, 2278L, 2003L, 2034L, 2060L, NA, 2431L, 2504L, 2428L)), class = "data.frame", row.names = c(NA, -11L))
X.1 timestamp sensor.type location.long location.lat annotation X Y Z
1 1 2019-01-26T16:25:00Z acceleration NA NA Moving/Climbing 2219 1416 2189
2 2 2019-01-26T16:25:00Z acceleration NA NA Moving/Climbing 1694 1581 2209
3 3 2019-01-26T16:25:00Z acceleration NA NA Moving/Climbing 1976 1524 2121
4 4 2019-01-26T16:25:00Z acceleration NA NA Moving/Climbing 1744 1620 2278
5 5 2019-01-26T16:25:00Z acceleration NA NA Moving/Climbing 2014 1409 2003
6 6 2019-01-26T16:25:00Z acceleration NA NA Moving/Climbing 2202 1545 2034
7 7 2019-01-26T16:25:00Z acceleration NA NA Moving/Climbing 2269 1771 2060
8 8 2019-01-26T16:25:47Z gps 44.47773 -12.28397 Moving/Climbing NA NA NA
9 9 2019-01-26T16:26:00Z acceleration NA NA Moving/Climbing 1874 1687 2431
10 10 2019-01-26T16:26:00Z acceleration NA NA Moving/Climbing 2024 1773 2504
11 11 2019-01-26T16:26:00Z acceleration NA NA Moving/Climbing 1990 1813 2428
Does that make sense? I know lubridate can summarize data based on time intervals but how do I add a new group id (variable) based on timestamps?
Upvotes: 0
Views: 119
Reputation: 146224
Here's a solution using dplyr
and lubridate
. We convert your timestamp
column to a proper datetime class, add a new column rounding down to the nearest minute, and then create an ID based on the rounded timestamp:
library(dplyr)
library(lubridate)
dat %>%
mutate(
timestamp = ymd_hms(timestamp),
minute = floor_date(timestamp, unit = "minute"),
group_id = as.integer(factor(minute))
)
# X.1 timestamp sensor.type location.long location.lat annotation X Y Z
# 1 1 2019-01-26 16:25:00 acceleration NA NA Moving/Climbing 2219 1416 2189
# 2 2 2019-01-26 16:25:00 acceleration NA NA Moving/Climbing 1694 1581 2209
# 3 3 2019-01-26 16:25:00 acceleration NA NA Moving/Climbing 1976 1524 2121
# 4 4 2019-01-26 16:25:00 acceleration NA NA Moving/Climbing 1744 1620 2278
# 5 5 2019-01-26 16:25:00 acceleration NA NA Moving/Climbing 2014 1409 2003
# 6 6 2019-01-26 16:25:00 acceleration NA NA Moving/Climbing 2202 1545 2034
# 7 7 2019-01-26 16:25:00 acceleration NA NA Moving/Climbing 2269 1771 2060
# 8 8 2019-01-26 16:25:47 gps 44.47773 -12.28397 Moving/Climbing NA NA NA
# 9 9 2019-01-26 16:26:00 acceleration NA NA Moving/Climbing 1874 1687 2431
# 10 10 2019-01-26 16:26:00 acceleration NA NA Moving/Climbing 2024 1773 2504
# 11 11 2019-01-26 16:26:00 acceleration NA NA Moving/Climbing 1990 1813 2428
# minute group_id
# 1 2019-01-26 16:25:00 1
# 2 2019-01-26 16:25:00 1
# 3 2019-01-26 16:25:00 1
# 4 2019-01-26 16:25:00 1
# 5 2019-01-26 16:25:00 1
# 6 2019-01-26 16:25:00 1
# 7 2019-01-26 16:25:00 1
# 8 2019-01-26 16:25:00 1
# 9 2019-01-26 16:26:00 2
# 10 2019-01-26 16:26:00 2
# 11 2019-01-26 16:26:00 2
Upvotes: 0