gisol
gisol

Reputation: 754

Why does the calculation of Cohen's kappa fail across different packages on this contingency table?

I have a contingency table for which I would like to calculate Cohens's kappa - the level of agreement. I have tried using three different packages, which all seem to fail to some degree. The package e1071 has a function specifically for a contingency table, but that too seems to fail. Below is reproducable code. You will need to install packages concord, e1071, and irr.

# Recreate my contingency table, output with dput
conf.mat<-structure(c(810531L, 289024L, 164757L, 114316L), .Dim = c(2L, 
2L), .Dimnames = structure(list(landsat_2000_bin = c("0", "1"
), MOD12_2000_binForest = c("0", "1")), .Names = c("landsat_2000_bin", 
"MOD12_2000_binForest")), class = "table")

library(concord)
cohen.kappa(conf.mat)
library(e1071)
classAgreement(conf.mat, match.names=TRUE)
library(irr)
kappa2(conf.mat) 

The output I get from running this is:

> cohen.kappa(conf.mat)
Kappa test for nominally classified data
4 categories - 2 methods
kappa (Cohen) = 0 , Z = NaN , p = NaN 
kappa (Siegel) = -0.333333 , Z = -0.816497 , p = 0.792892 
kappa (2*PA-1) = -1 

> classAgreement(conf.mat, match.names=TRUE)
    $diag
[1] 0.6708459
    $kappa
[1] NA
    $rand
[1] 0.5583764
    $crand
[1] 0.0594124
    Warning message:
In ni[lev] * nj[lev] : NAs produced by integer overflow

> kappa2(conf.mat) 
 Cohen's Kappa for 2 Raters (Weights: unweighted)
Subjects = 2 
Raters = 2 
Kappa = 0 
z = NaN 
p-value = NaN

Could anyone advise on why these might fail? I have a large dataset, but as this table is simple I didn't think that could cause such problems.

Upvotes: 4

Views: 2141

Answers (2)

nograpes
nograpes

Reputation: 18323

In the first function, cohen.kappa, you need to specify that you are using count data and not just a n*m matrix of n subjects and m raters.

# cohen.kappa(conf.mat,'count')
cohen.kappa(conf.mat,'count')

The second function is much more tricky. For some reason, your matrix is full of integer and not numeric. integer can't store really big numbers. So, when you multiply two of your big numbers together, it fails. For example:

i=975288 
j=1099555
class(i)
# [1] "numeric"
i*j
# 1.072383e+12
as.integer(i)*as.integer(j)
# [1] NA
# Warning message:
# In as.integer(i) * as.integer(j) : NAs produced by integer overflow

So you need to convert your matrix to have integers.

# classAgreement(conf.mat)
classAgreement(matrix(as.numeric(conf.mat),nrow=2))

Finally take a look at the documentation for ?kappa2. It requires an n*m matrix as explained above. It just won't work with your (efficient) data structure.

Upvotes: 3

lockedoff
lockedoff

Reputation: 513

Do you need to know specifically why those fail? Here is a function that computes the statistic -- in a hurry, so I might clean it up later (kappa wiki):

kap <- function(x) {
  a <- (x[1,1] + x[2,2]) / sum(x)
  e <- (sum(x[1,]) / sum(x)) * (sum(x[,1]) / sum(x)) + (1 - (sum(x[1,]) / sum(x))) * (1 - (sum(x[,1]) / sum(x)))
  (a-e)/(1-e)
}

Tests/output:

> (x = matrix(c(20,5,10,15), nrow=2, byrow=T))
     [,1] [,2]
[1,]   20    5
[2,]   10   15
> kap(x)
[1] 0.4
> (x = matrix(c(45,15,25,15), nrow=2, byrow=T))
     [,1] [,2]
[1,]   45   15
[2,]   25   15
> kap(x)
[1] 0.1304348
> (x = matrix(c(25,35,5,35), nrow=2, byrow=T))
     [,1] [,2]
[1,]   25   35
[2,]    5   35
> kap(x)
[1] 0.2592593
> kap(conf.mat)
[1] 0.1258621

Upvotes: 1

Related Questions