Reputation: 759
I am trying to compute the quantiles of a vector using sample weights. One of the very few functions I have found to perform this is Hmisc::wtd.quantile()
. The results I get seem to largely depend on the scale of the weights (i.e. mean) and I don't understand why this happens (e.g. shouldn't the 10th percentile of a variable be the same if we multiply the weights by a constant?). You can see the differences in results when the function is applied to the vectors weighted
, small_weights
(same weights as weighted
multiplied by 0.1) and scaled_weights
(with mean equal to 1).
Also, none of the results matches the unweighted quantiles produced by using the sample weights wt
as frequency weights and multiplying the number of rows for each observation accordingly using tidyr::uncount()
(see the duplicated_rows
tibble below).
Can someone help me understand why this happens? Is there a way to make the weighted quantiles not depend on the scale of the sample weights?
Many thanks.
library(Hmisc)
library(tidyr)
weighted <- tibble::tibble(var_ = seq(0, 10),
wt = c(2, 0.5, 2, 0.5,
2, 1, 2, 0.5,
2, 0.5, 2))
duplicated_rows <- tidyr::uncount(weighted, wt*2)
small_weights <- tibble::tibble(var_ = seq(0, 10),
wt = c(2, 0.5, 2, 0.5,
2, 1, 2, 0.5,
2, 0.5, 2)*0.1)
scaled_weights <- weighted
scaled_weights$wt <- weighted$wt/mean(weighted$wt)
quantile(duplicated_rows[["var_"]], probs = seq(0,1, 0.1))
#> 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
#> 0.0 0.0 2.0 2.7 4.0 5.0 6.0 7.3 8.0 10.0 10.0
Hmisc::wtd.quantile(weighted[["var_"]], weighted[["wt"]], probs = seq(0,1, 0.1))
#> 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
#> 0.0 0.8 2.0 3.2 4.0 5.0 6.0 7.6 8.2 9.6 10.0
Hmisc::wtd.quantile(small_weights[["var_"]], small_weights[["wt"]], probs = seq(0,1, 0.1))
#> 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
#> 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0
Hmisc::wtd.quantile(scaled_weights[["var_"]], scaled_weights[["wt"]], probs = seq(0,1, 0.1))
#> 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
#> 0 2 2 4 4 6 6 8 8 10 10
Hmisc::wtd.quantile(weighted[["var_"]], weighted[["wt"]], probs = seq(0,1, 0.1), type = "(i-1)/(n-1)")
#> 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
#> 0.00 0.80 1.65 3.10 3.80 5.00 5.70 7.15 7.85 9.30 10.00
Created on 2021-08-22 by the reprex package (v2.0.0)
Upvotes: 2
Views: 3984
Reputation: 41260
Looks like normwt = TRUE
argument is needed:
If weights are frequency weights, then
normwt
should beFALSE
, and if weights are normalization (aka reliability) weights, thennormwt
should beTRUE
.
Hmisc::wtd.quantile(weighted[["var_"]], weighted[["wt"]], probs = seq(0,1, 0.1),normwt=T)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0 2 2 4 4 6 6 8 8 10 10
Hmisc::wtd.quantile(small_weights[["var_"]], small_weights[["wt"]], probs = seq(0,1, 0.1),normwt=T)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0 2 2 4 4 6 6 8 8 10 10
Hmisc::wtd.quantile(scaled_weights[["var_"]], scaled_weights[["wt"]], probs = seq(0,1, 0.1),normwt=T)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0 2 2 4 4 6 6 8 8 10 10
Upvotes: 4