Reputation: 161
I'm dealing with school data and I need to duplicate rows which serve larger grade spans that just ES or HS. Some sample code to illustrate my problem:
# data
schools <- tibble(name = c("school a", "school b", "school c", "school z"),
type = c("es", "NA", "hs", "es"),
gslo = c("01", "08", "09", "KG"),
gshi = c("12", "12", "12", "05"))
schools
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
Where gslo
and gshi
are the lowest and highest grades served, respectively. In the United States these would be divided into high schools or middle schools or elementary schools, the type
.
Some schools serve more than just elementary grades, but are now only being counted as type
== "es".
schools_attempt <- schools %>%
# add row based on condition and change type
# not generalized
rbind(schools %>% filter(gslo == "01", gshi == "12") %>% mutate(type = "hs"))
> schools_attempt
# A tibble: 5 x 4
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
5 school a hs 01 12
This works but is not general. Is it possible to avoid a huge case_when? Note the changed school type classification (es -> hs)
schools_want <- tibble(name = c("school a", "school b", "school c", "school z", "school a"),
type = c("es", "NA", "hs", "es", "hs"),
gslo = c("01", "08", "09", "KG", "01"),
gshi = c("12", "12", "12", "05", "12"))
> schools_want
# A tibble: 5 x 4
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
5 school a hs 01 12
Thanks!
Upvotes: 0
Views: 40
Reputation: 66415
This might suffice as a general approach. If it starts in grade 9+, it's a high school. If it ends before grade 9, it's an elementary. Otherwise, it's both and we can split into two rows.
library(dplyr)
schools %>%
mutate(across(gslo:gshi, ~if_else(.x == "KG", 0, as.numeric(.x))),
type2 = case_when(
gslo >= 9 ~ "hs",
gshi <= 8 ~ "es",
TRUE ~ "hs, es"
)) %>%
separate_rows(type2)
# A tibble: 6 x 5
name type gslo gshi type2
<chr> <chr> <dbl> <dbl> <chr>
1 school a es 1 12 hs
2 school a es 1 12 es
3 school b NA 8 12 hs
4 school b NA 8 12 es
5 school c hs 9 12 hs
6 school z es 0 5 es
Edit: if you want to preserve the gslo/gshi columns as-is, add .names = "{.col}_num"),
to the across()
call and use gslo_num
and gshi_num
in the case_when.
Upvotes: 1