Francisco
Francisco

Reputation: 161

Duplicate Row by condition and change type

I'm dealing with school data and I need to duplicate rows which serve larger grade spans that just ES or HS. Some sample code to illustrate my problem:

# data
schools <- tibble(name = c("school a", "school b", "school c", "school z"), 
                  type = c("es", "NA", "hs", "es"),
                  gslo = c("01", "08", "09", "KG"),
                  gshi = c("12", "12", "12", "05"))
schools
  name     type  gslo  gshi 
  <chr>    <chr> <chr> <chr>
1 school a es    01    12   
2 school b NA    08    12   
3 school c hs    09    12   
4 school z es    KG    05   

Where gslo and gshi are the lowest and highest grades served, respectively. In the United States these would be divided into high schools or middle schools or elementary schools, the type.

Some schools serve more than just elementary grades, but are now only being counted as type == "es".

schools_attempt <- schools %>%
# add row based on condition and change type 
# not generalized 
  rbind(schools %>% filter(gslo == "01", gshi == "12") %>% mutate(type = "hs"))
> schools_attempt
# A tibble: 5 x 4
  name     type  gslo  gshi 
  <chr>    <chr> <chr> <chr>
1 school a es    01    12   
2 school b NA    08    12   
3 school c hs    09    12   
4 school z es    KG    05   
5 school a hs    01    12   

This works but is not general. Is it possible to avoid a huge case_when? Note the changed school type classification (es -> hs)

schools_want <- tibble(name = c("school a", "school b", "school c", "school z", "school a"), 
                       type = c("es", "NA", "hs", "es", "hs"),
                       gslo = c("01", "08", "09", "KG", "01"),
                       gshi = c("12", "12", "12", "05", "12"))
> schools_want
# A tibble: 5 x 4
  name     type  gslo  gshi 
  <chr>    <chr> <chr> <chr>
1 school a es    01    12   
2 school b NA    08    12   
3 school c hs    09    12   
4 school z es    KG    05   
5 school a hs    01    12 

Thanks!

Upvotes: 0

Views: 40

Answers (1)

Jon Spring
Jon Spring

Reputation: 66415

This might suffice as a general approach. If it starts in grade 9+, it's a high school. If it ends before grade 9, it's an elementary. Otherwise, it's both and we can split into two rows.

library(dplyr)
schools %>% 
  mutate(across(gslo:gshi, ~if_else(.x == "KG", 0, as.numeric(.x))),
         type2 = case_when(
           gslo >= 9 ~ "hs",
           gshi <= 8 ~ "es",
           TRUE ~ "hs, es"
         )) %>%
  separate_rows(type2)



# A tibble: 6 x 5
  name     type   gslo  gshi type2
  <chr>    <chr> <dbl> <dbl> <chr>
1 school a es        1    12 hs   
2 school a es        1    12 es   
3 school b NA        8    12 hs   
4 school b NA        8    12 es   
5 school c hs        9    12 hs   
6 school z es        0     5 es  

Edit: if you want to preserve the gslo/gshi columns as-is, add .names = "{.col}_num"), to the across() call and use gslo_num and gshi_num in the case_when.

Upvotes: 1

Related Questions