Erik De Luca
Erik De Luca

Reputation: 307

Warning about rank transformation not taking weights into account in coin::kruskal_test in R

I'm using the coin::kruskal_test function in R to perform a weighted Kruskal-Wallis test, but I'm encountering a warning that the rank transformation doesn't take weights into account.

Here's the code that produces the warning:

library(tidyverse)
library(coin)
library(purrr)

set.seed(1)
punteggi = 
  tibble(
    Codice = paste0("cod_", 1:200),
    Regione = sample(c("FVG", "Lazio", "Sicilia"), size = 200, replace = T, prob = c(.4, .2, .4)),
    Genere = sample(c("Femminile", "Maschile"), size = 200, replace = T, prob = c(.6, .4)),
    Area = sample(c("Urbana", "Extra urbana"), size = 200, replace = T, prob = c(.6, .4)),
  )



punteggi = punteggi |> 
  group_by(Regione, Genere, Area) |> 
  mutate(
    mu = runif(1, 4, 7),
    sd = runif(1, 1, 2),
    weight = n()
  ) |> 
  ungroup() |>
  rowwise() |> 
  mutate(pt_tot = rnorm(1, mu, sd))

punteggi

kruskal_per_group_weighted <- function(data) {
  test_result <- tryCatch({
      coin::kruskal_test(pt_tot ~ value, data = data, weights = ~ weight)
  }, error = function(e) {
    message("Errore in kruskal_test: ", e)
    return(NULL)
  })
  return(coin::pvalue(test_result))
}

kruskal_per_group_NO_weighted <- function(data) {
  test_result <- tryCatch({
      coin::kruskal_test(pt_tot ~ value, data = data)
  }, error = function(e) {
    message("Errore in kruskal_test: ", e)
    return(NULL)
  })
  return(coin::pvalue(test_result))
}

punteggi |> 
  pivot_longer(c(Regione, Genere, Area)) |> 
  mutate(across(value, as.factor)) |> 
  nest(.by = name) |> 
  mutate(
    kruskal = map_dbl(data, kruskal_per_group_NO_weighted),
    kruskal_weighted = map_dbl(data, kruskal_per_group_weighted)
    )

The specific warning message I receive is:

    <warning/rlang_warning>
Warning in `mutate()`:
i In argument: `kruskal_weighted = map_dbl(data, kruskal_per_group_weighted)`.
Caused by warning in `ft()`:
! rank transformation doesn't take weights into account
---
Backtrace:
    x
 1. +-dplyr::mutate(...)
 2. \-dplyr:::mutate.data.frame(...)

Additionally, the p-values for the weighted and non-weighted Kruskal-Wallis tests are different, as shown in the following table:

tibble of results

Is there a way to address this warning and ensure that the rank transformation properly accounts for weights in coin::kruskal_test? Or is there an alternative approach to conducting a weighted Kruskal-Wallis test in R?

Thank you for your assistance.

Upvotes: 0

Views: 37

Answers (1)

Thomas Lumley
Thomas Lumley

Reputation: 2765

You could just repeat each observation as many times as the weights, since that's all kruskal_test is doing.

Upvotes: 1

Related Questions