How to overwrite values after grouping but preserving the initial value for the group

Question

I would like to do a simple overwriting in R, but I guess is not as simple as I thought it would be. I have a beginning, an end, a majorCategory and a subCategory. For every majorCategory, there is a beginning and an end. I have the length of the majorCategory and I calculate the length of the subcategories, and from now on I am stuck because I would like to overwrite the beginnings and the ends for the subcategories but at the same time preserve the initial beginnings for the majorCategory . How do I do that?

 df <- structure(list(majorCat = structure(c(1L, 1L, 1L, 1L, 1L, 1L,  1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 
 1L), .Label = c("Major_A", "Major_B"), class = "factor"), begin = c(60, 60, 60, 60, 60, 60, 60,60,60, 
 60, 20, 20, 20, 20, 20), end = c(75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 80, 80, 80, 80, 80), subCat 
 = c("d", "b", "c", "e", "a", "b", "e", "a", "d", "b", "d", "b", "d", "e", "b")), row.names = c("1", 
 "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"), class = "data.frame")


  library(dplyr)
  df %>% 
     group_by(begin, end, subCat) %>% 
     summarise(countt = n()) %>% 
     mutate(percc = countt/sum(countt),
     lengthh = end - begin,
     lengthhSubCat = lengthh * percc) %>% select(begin, end, lengthh , lengthhSubCat)

I would very much like to get to this:

Thank you very very much for your time!

akrun · Accepted Answer

After creating the columns, create a new group by pasteing the 'begin', 'end' ('grp'), concatenate the first element of 'begin' with the last element removed from 'lengthhSubCat', do a cumulative sum to create the 'begin' column, while we take the lead of 'begin' and fill the last observation with the corresponding value of 'end' to update the 'end' column

library(dplyr)
library(stringr)
df %>% 
   group_by(begin, end, subCat) %>%
   summarise(countt = n()) %>%
   mutate(percc = countt/sum(countt),
         lengthh = end - begin,
         lengthhSubCat = lengthh * percc) %>% 
   group_by(grp = str_c(begin, end, sep="_")) %>% 
   mutate(begin = cumsum(c(first(begin), lengthhSubCat[-n()])),
         end = lead(begin, default = last(end))) %>%
   ungroup %>%
   select(begin, end, lengthh, lengthhSubCat)
# A tibble: 8 x 4
#  begin   end lengthh lengthhSubCat
#               
#1  20    44        60          24  
#2  44    68        60          24  
#3  68    80        60          12  
#4  60    63        15           3  
#5  63    67.5      15           4.5
#6  67.5  69        15           1.5
#7  69    72        15           3  
#8  72    75        15           3

How to overwrite values after grouping but preserving the initial value for the group

Answers (1)

Related Questions