Tim Wilcox
Tim Wilcox

Reputation: 1331

Issues with combining a case when into a cumsum calculation in R

Below is the sample data and my attempt at this. My primary question is how I would get the smbsummary3 data frame to show values of small = 2, 3, or 4 when they do not exist in the source data. My summarise section calculates correctly. Do I need to add a case_when statement prior to the summarise or treat the lack of 2,3,4 as NA?

emp <- c(1,2,3,4,5,6,7,8,1,12,54,101,33,159,201,261)
small <- c(1,1,1,1,1,1,1,1,1,1,1,2,1,3,3,4)
area <-c(001,001,001,001,001,001,001,001,003,003,003,003,003,003,003,003)

smbtest2 <- data.frame(emp,small,area)

 smbtest2 <- smbtest2 %>% mutate(smb = case_when(emp >=0 & emp <100 ~ "1",emp >=0 & emp <150 ~ "2",emp >=0 & emp <250 ~ "3", emp >=0 & emp <500 ~ "4",emp >=0 & emp <1000000 ~ "Total"))

smbsummary3<-smbtest2 %>% 
group_by(area,small) %>%
summarise(emp = sum(emp), worksites = n(), 
        .groups = 'drop_last') %>% 
mutate(emp = cumsum(emp),
     worksites = cumsum(worksites)) 

Schema for the small variable.

  Emp               smb
  0 to 100           1
  0 to 150           2
  0 to 250           3
  0 to 500           4

Desired result

  area       small    emp   worksites
  001          1       36      8
  001          2       36      8
  001          3       36      8
  001          4       36      8
  003          1       100     4
  003          2       201     5
  003          3       561     7
  003          4       822     8

Upvotes: 1

Views: 74

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102251

Here is a data.table option using left-join and nafill

setDT(smbsummary3)[
  smbsummary3[, .SD[, unique(smbsummary3[, .(small)])], area],
  on = .(area, small)
][
  ,
  lapply(.SD, nafill, type = "locf")
]

which gives

   area small emp worksites
1:    1     1  36         8
2:    1     2  36         8
3:    1     3  36         8
4:    1     4  36         8
5:    3     1 100         4
6:    3     2 201         5
7:    3     3 561         7
8:    3     4 822         8

Upvotes: 1

akrun
akrun

Reputation: 887511

If we need the missing combinations, do a complete

library(dplyr)
library(tidyr)
smbsummary3 %>% 
    ungroup %>% 
    complete(area, small = unique(small)) %>% 
    fill(emp, worksites)

-output

# A tibble: 8 x 4
#   area small   emp worksites
#  <dbl> <dbl> <dbl>     <int>
#1     1     1    36         8
#2     1     2    36         8
#3     1     3    36         8
#4     1     4    36         8
#5     3     1   100         4
#6     3     2   201         5
#7     3     3   561         7
#8     3     4   822         8

Upvotes: 1

Related Questions