Mariano C Giglio
Mariano C Giglio

Reputation: 135

R: perform Cohen's Kappa test between all possible combination of variables

I have the following data frame:

structure(list(test1 = c(0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1), 
    test2 = c(0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1), test3 = c(0, 
    0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1), test4 = c(1, 0, 1, 1, 1, 
    1, 0, 1, 0, 1, 0, 1), test5 = c(0, 0, 1, 1, 0, 1, 1, 1, 0, 
    1, 0, 1), test6 = c(0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

Each variable/column corresponds to a test (test1, test2, test3, test4...) and has the test's results (1 or 0)for each observation.

I would like to calculate the Kappa statistic for all possible pairs of variables and to have the results of these combinations in a dataframe, as

structure(list(...1 = c("test1-test2", "test1-test3", "test1-test4", 
"test2-test1"), `z-score` = c(NA, NA, NA, NA), kappa = c(NA, 
NA, NA, NA), `p-value` = c(NA, NA, NA, NA)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))
> 

Can someone help me?

Thank you!

Upvotes: 2

Views: 736

Answers (2)

StupidWolf
StupidWolf

Reputation: 46908

Your data:

test <- structure(list(test1 = c(0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1), 
    test2 = c(0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1), test3 = c(0, 
    0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1), test4 = c(1, 0, 1, 1, 1, 
    1, 0, 1, 0, 1, 0, 1), test5 = c(0, 0, 1, 1, 0, 1, 1, 1, 0, 
    1, 0, 1), test6 = c(0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

Use combn to get all possible comparisons:

PAIRS = combn(names(test),2)

Use irr and iterate through the combinations:

library(irr)
all_results = apply(PAIRS,2,function(i){
result = kappa2(test[,i], "unweighted")
data.frame(
'comparison'=paste(i,collapse="-"),
'z-score'=result$statistic,
'kappa'=result$value,
'p-value'=result$p.value
)
})

The result is in a list, we combine them into a data.frame

all_results =  do.call(rbind,all_results)

    comparison     z.score       kappa      p.value
1  test1-test2 -0.09897433 -0.02857143 0.9211586502
2  test1-test3  0.58554004  0.16666667 0.5581846494
3  test1-test4 -0.82807867 -0.23529412 0.4076259477
4  test1-test5 -0.09897433 -0.02857143 0.9211586502
5  test1-test6  0.58554004  0.16666667 0.5581846494
6  test2-test3  0.58554004  0.16666667 0.5581846494
7  test2-test4  1.65615734  0.47058824 0.0976899593
8  test2-test5  3.46410162  1.00000000 0.0005320055
9  test2-test6  0.58554004  0.16666667 0.5581846494
10 test3-test4 -1.22474487 -0.33333333 0.2206713619
11 test3-test5  0.58554004  0.16666667 0.5581846494
12 test3-test6  3.46410162  1.00000000 0.0005320055
13 test4-test5  1.65615734  0.47058824 0.0976899593
14 test4-test6 -1.22474487 -0.33333333 0.2206713619
15 test5-test6  0.58554004  0.16666667 0.5581846494

Upvotes: 1

jon
jon

Reputation: 370

You will need the irr package to be installed (although you can replace this with any other version of the test). I named your original dataset as dfr1 and the resulting dataset as dfr2. This will loop through all of your column names and retrieve the results from each test:

dfr2 <- data.frame(pair = as.character(), z_score = as.numeric(), kappa = as.numeric(), p_value = as.numeric())
for(i in 1:ncol(dfr1)){
  for(j in 1:ncol(dfr1)){
   if(i != j){
     tst <- irr::kappa2(dfr1[,c(i,j)])
     dfr2 <- rbind(dfr2, data.frame(pair = paste0(names(dfr1[,c(i,j)]), collapse = "-"),
                                    z_score = tst$statistic,
                                    kappa = tst$value,
                                    p_value = tst$p.value))
   } 
  }
}

Upvotes: 1

Related Questions