LuizZ
LuizZ

Reputation: 1044

Add column with quantile rank, within groups, using sample weight in R/dplyr

I have a data set have with the following structure:

have <- tibble::tibble(id = 1:30,  
                       state = rep(c("A", "B", "C"), each = 10), 
                       score = c(147, 735, 519, 458, 599, 628, 988, 787, 298, 612,
                                 319, 715, 248, 637, 239, 254, 601, 702, 902, 867,
                                 343, 535, 730, 518, 277, 612, 869, 865, 227, 641), 
                       weight = c(3.13, 1.46, 2.57, 4.39, 1.32, 3.81, 1.29, 1.58, 2.74, 4.13,
                                   1.43, 1.29, 1.81, 3.87, 3.10, 1.18, 4.15, 4.35, 3.35, 3.59,
                                   4.69, 3.38, 3.51, 3.35, 2.60, 1.99, 2.34, 4.60, 3.77, 1.31))

I would like to add a column with weighted terciles wterciles and another column with weighted quartiles wquartiles groups of score within each state incorporating sample weights weight.

The weight variable is a frequency expansion weight for computing point estimates. It is a variable needed to take into account because not all students who should take the test actually did it, so the institution responsible for the test calculated weights for students based on the attendance rate of each school and socioeconomic group. Therefore, the weight of 2 implies that this student should be counted as if he was 2 students when considering the state average.

I prefer {dplyr} syntax, but couldnt find a way of using weights in it. The dplyr::ntile() function does not handle weights.

Without weights it would be something like:

library(dplyr)

have %>% 
  group_by(state) %>% 
    mutate(
      wtercile = ntile(score, 3),
      wquartile = ntile(score, 4)) %>%
  ungroup()

Upvotes: 2

Views: 86

Answers (0)

Related Questions