eyei
eyei

Reputation: 402

How to overwrite values after grouping but preserving the initial value for the group

I would like to do a simple overwriting in R, but I guess is not as simple as I thought it would be. I have a beginning, an end, a majorCategory and a subCategory. For every majorCategory, there is a beginning and an end. I have the length of the majorCategory and I calculate the length of the subcategories, and from now on I am stuck because I would like to overwrite the beginnings and the ends for the subcategories but at the same time preserve the initial beginnings for the majorCategory . How do I do that?

 df <- structure(list(majorCat = structure(c(1L, 1L, 1L, 1L, 1L, 1L,  1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 
 1L), .Label = c("Major_A", "Major_B"), class = "factor"), begin = c(60, 60, 60, 60, 60, 60, 60,60,60, 
 60, 20, 20, 20, 20, 20), end = c(75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 80, 80, 80, 80, 80), subCat 
 = c("d", "b", "c", "e", "a", "b", "e", "a", "d", "b", "d", "b", "d", "e", "b")), row.names = c("1", 
 "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"), class = "data.frame")


  library(dplyr)
  df %>% 
     group_by(begin, end, subCat) %>% 
     summarise(countt = n()) %>% 
     mutate(percc = countt/sum(countt),
     lengthh = end - begin,
     lengthhSubCat = lengthh * percc) %>% select(begin, end, lengthh , lengthhSubCat)

enter image description here

I would very much like to get to this:
enter image description here

Thank you very very much for your time!

Upvotes: 1

Views: 149

Answers (1)

akrun
akrun

Reputation: 886948

After creating the columns, create a new group by pasteing the 'begin', 'end' ('grp'), concatenate the first element of 'begin' with the last element removed from 'lengthhSubCat', do a cumulative sum to create the 'begin' column, while we take the lead of 'begin' and fill the last observation with the corresponding value of 'end' to update the 'end' column

library(dplyr)
library(stringr)
df %>% 
   group_by(begin, end, subCat) %>%
   summarise(countt = n()) %>%
   mutate(percc = countt/sum(countt),
         lengthh = end - begin,
         lengthhSubCat = lengthh * percc) %>% 
   group_by(grp = str_c(begin, end, sep="_")) %>% 
   mutate(begin = cumsum(c(first(begin), lengthhSubCat[-n()])),
         end = lead(begin, default = last(end))) %>%
   ungroup %>%
   select(begin, end, lengthh, lengthhSubCat)
# A tibble: 8 x 4
#  begin   end lengthh lengthhSubCat
#  <dbl> <dbl>   <dbl>         <dbl>
#1  20    44        60          24  
#2  44    68        60          24  
#3  68    80        60          12  
#4  60    63        15           3  
#5  63    67.5      15           4.5
#6  67.5  69        15           1.5
#7  69    72        15           3  
#8  72    75        15           3  

Upvotes: 2

Related Questions