Reputation: 1331
Below is the sample data and my attempt at this. My primary question is how I would get the smbsummary3 data frame to show values of small = 2, 3, or 4 when they do not exist in the source data. My summarise section calculates correctly. Do I need to add a case_when statement prior to the summarise or treat the lack of 2,3,4 as NA?
emp <- c(1,2,3,4,5,6,7,8,1,12,54,101,33,159,201,261)
small <- c(1,1,1,1,1,1,1,1,1,1,1,2,1,3,3,4)
area <-c(001,001,001,001,001,001,001,001,003,003,003,003,003,003,003,003)
smbtest2 <- data.frame(emp,small,area)
smbtest2 <- smbtest2 %>% mutate(smb = case_when(emp >=0 & emp <100 ~ "1",emp >=0 & emp <150 ~ "2",emp >=0 & emp <250 ~ "3", emp >=0 & emp <500 ~ "4",emp >=0 & emp <1000000 ~ "Total"))
smbsummary3<-smbtest2 %>%
group_by(area,small) %>%
summarise(emp = sum(emp), worksites = n(),
.groups = 'drop_last') %>%
mutate(emp = cumsum(emp),
worksites = cumsum(worksites))
Schema for the small variable.
Emp smb
0 to 100 1
0 to 150 2
0 to 250 3
0 to 500 4
Desired result
area small emp worksites
001 1 36 8
001 2 36 8
001 3 36 8
001 4 36 8
003 1 100 4
003 2 201 5
003 3 561 7
003 4 822 8
Upvotes: 1
Views: 74
Reputation: 102251
Here is a data.table
option using left-join and nafill
setDT(smbsummary3)[
smbsummary3[, .SD[, unique(smbsummary3[, .(small)])], area],
on = .(area, small)
][
,
lapply(.SD, nafill, type = "locf")
]
which gives
area small emp worksites
1: 1 1 36 8
2: 1 2 36 8
3: 1 3 36 8
4: 1 4 36 8
5: 3 1 100 4
6: 3 2 201 5
7: 3 3 561 7
8: 3 4 822 8
Upvotes: 1
Reputation: 887511
If we need the missing combinations, do a complete
library(dplyr)
library(tidyr)
smbsummary3 %>%
ungroup %>%
complete(area, small = unique(small)) %>%
fill(emp, worksites)
-output
# A tibble: 8 x 4
# area small emp worksites
# <dbl> <dbl> <dbl> <int>
#1 1 1 36 8
#2 1 2 36 8
#3 1 3 36 8
#4 1 4 36 8
#5 3 1 100 4
#6 3 2 201 5
#7 3 3 561 7
#8 3 4 822 8
Upvotes: 1