Reputation: 402
I would like to do a simple overwriting in R, but I guess is not as simple as I thought it would be.
I have a beginning
, an end
, a majorCategory
and a subCategory
. For every majorCategory
, there is a beginning
and an end
. I have the length of the majorCategory
and I calculate the length of the subcategories
, and from now on I am stuck because I would like to overwrite the beginnings and the ends for the subcategories but at the same time preserve the initial beginnings for the majorCategory
. How do I do that?
df <- structure(list(majorCat = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
1L), .Label = c("Major_A", "Major_B"), class = "factor"), begin = c(60, 60, 60, 60, 60, 60, 60,60,60,
60, 20, 20, 20, 20, 20), end = c(75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 80, 80, 80, 80, 80), subCat
= c("d", "b", "c", "e", "a", "b", "e", "a", "d", "b", "d", "b", "d", "e", "b")), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"), class = "data.frame")
library(dplyr)
df %>%
group_by(begin, end, subCat) %>%
summarise(countt = n()) %>%
mutate(percc = countt/sum(countt),
lengthh = end - begin,
lengthhSubCat = lengthh * percc) %>% select(begin, end, lengthh , lengthhSubCat)
I would very much like to get to this:
Thank you very very much for your time!
Upvotes: 1
Views: 149
Reputation: 886948
After creating the columns, create a new group by paste
ing the 'begin', 'end' ('grp'), concatenate the first
element of 'begin' with the last element removed from 'lengthhSubCat', do a cumulative sum to create the 'begin' column, while we take the lead
of 'begin' and fill the last
observation with the corresponding value of 'end' to update the 'end' column
library(dplyr)
library(stringr)
df %>%
group_by(begin, end, subCat) %>%
summarise(countt = n()) %>%
mutate(percc = countt/sum(countt),
lengthh = end - begin,
lengthhSubCat = lengthh * percc) %>%
group_by(grp = str_c(begin, end, sep="_")) %>%
mutate(begin = cumsum(c(first(begin), lengthhSubCat[-n()])),
end = lead(begin, default = last(end))) %>%
ungroup %>%
select(begin, end, lengthh, lengthhSubCat)
# A tibble: 8 x 4
# begin end lengthh lengthhSubCat
# <dbl> <dbl> <dbl> <dbl>
#1 20 44 60 24
#2 44 68 60 24
#3 68 80 60 12
#4 60 63 15 3
#5 63 67.5 15 4.5
#6 67.5 69 15 1.5
#7 69 72 15 3
#8 72 75 15 3
Upvotes: 2