Reputation: 450
I continuously measured length of 4 objects (A,B,C,D) across time.
I have 2 data frames i.e. data1 and data2 as shown below...
data1 contains a measurement data, where length was measured across time. data 2 contains categories (A,B,C,D) with their time intervals of measurement.
data1 =structure(list(Time = c("9/18/2019 15:30", "9/18/2019 15:31",
"9/18/2019 15:32", "9/18/2019 15:33", "9/18/2019 15:34", "9/18/2019 15:35",
"9/18/2019 15:36", "9/18/2019 15:37", "9/18/2019 15:38", "9/18/2019 15:39",
"9/18/2019 15:40", "9/18/2019 15:41", "9/18/2019 15:42", "9/18/2019 15:43",
"9/18/2019 15:44", "9/18/2019 15:45", "9/18/2019 15:46", "9/18/2019 15:47",
"9/18/2019 15:48", "9/18/2019 15:49", "9/18/2019 15:50", "9/18/2019 15:51",
"9/18/2019 15:52", "9/18/2019 15:53", "9/18/2019 15:54", "9/18/2019 15:55",
"9/18/2019 15:56", "9/18/2019 15:57", "9/18/2019 15:58", "9/18/2019 15:59",
"9/18/2019 16:00", "9/18/2019 16:01", "9/18/2019 16:02", "9/18/2019 16:03",
"9/18/2019 16:04", "9/18/2019 16:05", "9/18/2019 16:06", "9/18/2019 16:07",
"9/18/2019 16:08", "9/18/2019 16:09", "9/18/2019 16:10", "9/18/2019 16:11",
"9/18/2019 16:12", "9/18/2019 16:13", "9/18/2019 16:14", "9/18/2019 16:15",
"9/18/2019 16:16", "9/18/2019 16:17"), length = c(36, 95, 45,
45, 50, 47, 45, 43, 40, 38, 36, 33, 31, 78, 5, 45, 10, 45, 80,
116, 151, 186, 222, 257, 292, 328, 45, 454, 57, 197, 203, 209,
215, 221, 227, 233, 239, 45, 65, 85, 105, 45, 45, 15, 5, 78,
96, 127)), .Names = c("Time", "length"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -48L), spec = structure(list(
cols = structure(list(Time = structure(list(), class = c("collector_character",
"collector")), length = structure(list(), class = c("collector_double",
"collector"))), .Names = c("Time", "length")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
data1$Time = as.POSIXct(strptime(data1$Time, format="%m/%d/%Y %H:%M"))
data2 = structure(list(Class = c("A", "B", "C", "D"), From = c("9/18/2019 15:31",
"9/18/2019 15:45", "9/18/2019 15:56", "9/18/2019 16:07"), To = c("9/18/2019 15:40",
"9/18/2019 15:53", "9/18/2019 16:05", "9/18/2019 16:16")), .Names = c("Class",
"From", "To"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-4L), spec = structure(list(cols = structure(list(Class = structure(list(), class = c("collector_character",
"collector")), From = structure(list(), class = c("collector_character",
"collector")), To = structure(list(), class = c("collector_character",
"collector"))), .Names = c("Class", "From", "To")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
data2$From = as.POSIXct(strptime(data2$From, format="%m/%d/%Y %H:%M"))
data2$To = as.POSIXct(strptime(data2$To, format="%m/%d/%Y %H:%M"))
I want to have a merged data frame, where the categories have their length measured within their time intervals (as shown below).
Any suggestion will be appreciated. Let me know if you need any more details.
p.s The data attached is a toy data. The actual data1 contains 200k rows. I have provided the code for Posixct time conversion.
Upvotes: 0
Views: 63
Reputation: 58
You can use the function right_join
from dplyr
package:
df <- right_join(data2,data1, by = c("From" = "Time"))
df <- df %>% select(-From)
head(df,11)
Output:
# A tibble: 11 x 3
Class To length
<chr> <chr> <dbl>
1 NA NA 36
2 NA NA 95
3 NA NA 45
4 NA NA 45
5 NA NA 50
6 NA NA 47
7 NA NA 45
8 NA NA 43
9 NA NA 40
10 NA NA 38
11 A 9/18/2019 15:40 36
If you want only the matches, you can use inner_join
instead of right_join
df <- inner_join(data2,data1, by = c("From" = "Time"))
df <- df %>% select(-From)
df
# A tibble: 4 x 3
Class To length
<chr> <chr> <dbl>
1 A 9/18/2019 15:40 36
2 B 9/18/2019 15:53 257
3 C 9/18/2019 16:05 233
4 D 9/18/2019 16:16 96
Upvotes: 1
Reputation: 1116
You can do it with nested ifelse()
functions:
data1$Class=ifelse(data1$Time>data2$From[1] & data1$Time<data2$To[1],"A",
ifelse(data1$Time>data2$From[2] & data1$Time<data2$To[2],"B",
ifelse(data1$Time>data2$From[3] & data1$Time<data2$To[3],"C",
ifelse(data1$Time>data2$From[4] & data1$Time<data2$To[4],"D",NA))))
Values for events not falling in any of the 4 category will be NA
Upvotes: 1