Josep Espasa
Josep Espasa

Reputation: 759

Quantiles using sample weights

I am trying to compute the quantiles of a vector using sample weights. One of the very few functions I have found to perform this is Hmisc::wtd.quantile(). The results I get seem to largely depend on the scale of the weights (i.e. mean) and I don't understand why this happens (e.g. shouldn't the 10th percentile of a variable be the same if we multiply the weights by a constant?). You can see the differences in results when the function is applied to the vectors weighted, small_weights (same weights as weighted multiplied by 0.1) and scaled_weights (with mean equal to 1).

Also, none of the results matches the unweighted quantiles produced by using the sample weights wt as frequency weights and multiplying the number of rows for each observation accordingly using tidyr::uncount() (see the duplicated_rows tibble below).

Can someone help me understand why this happens? Is there a way to make the weighted quantiles not depend on the scale of the sample weights?

Many thanks.

library(Hmisc)
library(tidyr)

weighted <- tibble::tibble(var_ = seq(0, 10),
                         wt = c(2, 0.5, 2, 0.5,
                                  2, 1, 2, 0.5,
                                  2, 0.5, 2))

duplicated_rows <- tidyr::uncount(weighted, wt*2)

small_weights <- tibble::tibble(var_ = seq(0, 10),
                                wt = c(2, 0.5, 2, 0.5,
                                       2, 1, 2, 0.5,
                                       2, 0.5, 2)*0.1)

scaled_weights <- weighted
scaled_weights$wt <- weighted$wt/mean(weighted$wt)


quantile(duplicated_rows[["var_"]], probs = seq(0,1, 0.1))
#>   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
#>  0.0  0.0  2.0  2.7  4.0  5.0  6.0  7.3  8.0 10.0 10.0

Hmisc::wtd.quantile(weighted[["var_"]], weighted[["wt"]], probs = seq(0,1, 0.1))
#>   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
#>  0.0  0.8  2.0  3.2  4.0  5.0  6.0  7.6  8.2  9.6 10.0

Hmisc::wtd.quantile(small_weights[["var_"]], small_weights[["wt"]], probs = seq(0,1, 0.1))
#>   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
#>  6.0  6.2  6.4  6.6  6.8  7.0  7.2  7.4  7.6  7.8  8.0

Hmisc::wtd.quantile(scaled_weights[["var_"]], scaled_weights[["wt"]], probs = seq(0,1, 0.1))
#>   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
#>    0    2    2    4    4    6    6    8    8   10   10

Hmisc::wtd.quantile(weighted[["var_"]], weighted[["wt"]], probs = seq(0,1, 0.1), type = "(i-1)/(n-1)")
#>    0%   10%   20%   30%   40%   50%   60%   70%   80%   90%  100% 
#>  0.00  0.80  1.65  3.10  3.80  5.00  5.70  7.15  7.85  9.30 10.00

Created on 2021-08-22 by the reprex package (v2.0.0)

Upvotes: 2

Views: 3984

Answers (1)

Waldi
Waldi

Reputation: 41260

Looks like normwt = TRUE argument is needed:

If weights are frequency weights, then normwt should be FALSE, and if weights are normalization (aka reliability) weights, then normwt should be TRUE.

Hmisc::wtd.quantile(weighted[["var_"]], weighted[["wt"]], probs = seq(0,1, 0.1),normwt=T)
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
   0    2    2    4    4    6    6    8    8   10   10 

Hmisc::wtd.quantile(small_weights[["var_"]], small_weights[["wt"]], probs = seq(0,1, 0.1),normwt=T)
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
   0    2    2    4    4    6    6    8    8   10   10 

Hmisc::wtd.quantile(scaled_weights[["var_"]], scaled_weights[["wt"]], probs = seq(0,1, 0.1),normwt=T)
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
   0    2    2    4    4    6    6    8    8   10   10 

Upvotes: 4

Related Questions