Reputation: 85
I have a data frame with a time series, a set of events, and an ID. I've joined two data frames by a shared time column, but I'd like to fill in the gaps in events.
Here is a simplified example, with the value column that I used to join the data frames, then the event and ID columns. I want to somehow fill in the columns for events and ID between the first (start) and last (end) event entry.
a<-c(seq(from = 150, to = 213, by = 3))
b<-c("start","","","mid","","end","", "",
"start", "", "", "end", "", "", "",
"start", "", "end", "start", "mid", "", "end")
c<-c("A","","","A","","A","", "",
"A", "", "", "A", "", "", "",
"B", "", "B", "B", "B", "", "B")
(data<-data.frame(value = a, event = b, ID = c))
Here is the goal, with the events and ID filled in between start and end:
agoal<-c(seq(from = 150, to = 213, by = 3))
bgoal<-c("start","start","start","mid","mid","end","", "",
"start", "start", "start", "end", "", "", "",
"start", "start", "end", "start", "mid", "mid", "end")
cgoal<-c("A","A","A","A","A","A","", "",
"A", "A", "A", "A", "", "", "",
"B", "B", "B", "B", "B", "B", "B")
(goal<-data.frame(value = agoal, event = bgoal, ID = cgoal))
Upvotes: 1
Views: 805
Reputation: 887391
Here is another option with data.table
data.frame
to data.table
- setDT
.SDcols
- 'event', 'ID'""
) with NA
- na_if
na.locf0
(from zoo
) to fill the NA elements with previous non-NA and assign (:=
) back to the columns.I
) where the 'event' value is duplicated
and where 'event' is "end" grouped by the run-length-id (rleid
) of event$V1
) and assign those 'event', 'ID' to blanklibrary(data.table)
library(zoo)
library(dplyr)
setDT(data)[, c("event", "ID") := lapply(.SD, function(x)
na.locf0(na_if(x, ""))), .SDcols = event:ID]
data[data[, .I[duplicated(event) & event == "end"] ,
rleid(event)]$V1, c("event", "ID") := .("", "")]
-output
data
value event ID
1: 150 start A
2: 153 start A
3: 156 start A
4: 159 mid A
5: 162 mid A
6: 165 end A
7: 168
8: 171
9: 174 start A
10: 177 start A
11: 180 start A
12: 183 end A
13: 186
14: 189
15: 192
16: 195 start B
17: 198 start B
18: 201 end B
19: 204 start B
20: 207 mid B
21: 210 mid B
22: 213 end B
Upvotes: 3
Reputation: 16988
You could use dplyr
and tidyr
for this task:
library(tidyr)
library(dplyr)
data %>%
mutate(grp = cumsum(case_when(event == "end" ~ -1,
event == "start" ~ 1,
TRUE ~ 0)),
across(c(-value, -grp), ~ ifelse(.x == "" & grp == 1, NA_character_, .x))) %>%
fill(c(-value), .direction="down") %>%
select(-grp)
returns
value event ID
1 150 start A
2 153 start A
3 156 start A
4 159 mid A
5 162 mid A
6 165 end A
7 168
8 171
9 174 start A
10 177 start A
11 180 start A
12 183 end A
13 186
14 189
15 192
16 195 start B
17 198 start B
18 201 end B
19 204 start B
20 207 mid B
21 210 mid B
22 213 end B
Upvotes: 1