Reputation: 23757
I have a long data frame in which a start and end day are assigned to an action. Some actions might only have a start day, and one action type can start and end multiple times.
I would now like to pivot this wide, so that there is one row for each new start of one action. (see desired output). For this I need to assign a unique ID which clearly identifies the group by which I am pivoting.
I think I could solve this with rle, but I cannot figure out how to use the information of two columns to assign the correct ID for pivoting.
library(tidyverse)
library(data.table)
x <- c("start", "end")
foo <- data.frame(time = c(rep(x, 3), "start", x, "start"),
action = rep(letters[1:4], times = c(4,2,3,1)),
day = 1:10)
## my approach gets stuck with the second rle
foo %>%
mutate(rle_time = rleid(time),
rle_action = rleid(action))
#> time action day rle_time rle_action
#> 1 start a 1 1 1
#> 2 end a 2 2 1
#> 3 start a 3 3 1
#> 4 end a 4 4 1
#> 5 start b 5 5 2
#> 6 end b 6 6 2
#> 7 start c 7 7 3
#> 8 start c 8 7 3
#> 9 end c 9 8 3
#> 10 start d 10 9 4
## don't know how to use this information to assign the correct ID to the rows
## so that I can pivot wider correctly.
## desired output
data.frame(action = rep(letters[1:4], times = c(2,1,2,1)),
start = c(seq(1,7,2),8,10),
end = c(seq(2,6,2), NA,9,NA)
)
#> action start end
#> 1 a 1 2
#> 2 a 3 4
#> 3 b 5 6
#> 4 c 7 NA
#> 5 c 8 9
#> 6 d 10 NA
Created on 2022-06-07 by the reprex package (v2.0.1)
Upvotes: 1
Views: 103
Reputation: 887153
Here, we could use cumsum
to create the index column after grouping by 'action' and then do reshaping to 'wide' format with either pivot_wider
(from tidyr
) or use data.table::dcast
library(dplyr)
library(tidyr)
foo %>%
group_by(action) %>%
mutate(rn = cumsum(time == 'start')) %>%
ungroup %>%
pivot_wider(names_from = time, values_from = day) %>%
select(-rn)
Upvotes: 3