Calculate krippendorff's alpha with confidence interval for multilabel annotations in Python

Question

I'm wondering how to correctly calculate Krippendorff's alpha with confidence intervals in a multilabel scenario and coding in Python.

Multilabel/Single Label Krippendorff's alpha with NLTK

NLTK provides an implementation of Krippendorff's alpha: nltk.metrics.agreement (s. Source Code ). It works for a single label and mutlilabel situation, specifying the correct distance function (Masi Distance for multilabel and binary distance for single label):

from nltk.metrics.agreement import AnnotationTask  
from nltk.metrics import masi_distance, binary_distance  

data = [
    ('c1', '1', frozenset({'v1', 'v2'})), ('c2', '1', frozenset({'v1'})),
    ('c1', '2', frozenset({'v2'})), ('c2', '2', frozenset({'v2', 'v3'})),
    ('c1', '3', frozenset()), ('c2', '3', frozenset({'v1'})),
    ('c1', '4', frozenset({'v3'})), ('c2', '4', frozenset({'v3', 'v2'}))
    ]

distance_func = masi_distance if multilabel else binary
task = AnnotationTask(data=data, distance=distance)
alpha = task.alpha()

Calculate the Confidence Interval

Now, for the confidence interval, I've consulted several source:

On Krippendorff's Alpha Coefficient (Gwet 2015)
K-Alpha Calculator–Krippendorff’s Alpha Calculator: A user-friendly tool for computing Krippendorff’s Alpha inter-rater reliability coefficient (Marzi et al. 2024)
KRIPPENDORFFSALPHA: AN R PACKAGE FOR MEASURING AGREEMENT USING KRIPPENDORFF’S ALPHA COEFFICIENT (Hugh 2021)

There seems to be two popular options on how to calculate confidence intervals:

Using Bootstrapping (as in Hugh 2021, Marzi et al. 2024 and other R-packages)
Using a closed form solution by Gwet 2015 (which seems to have no support by previous research though)

What can you recommend? And how would a correct implementation look like?

I've tried implementing the bootstrapping method looking like the following, note: It assumes a pandas dataframe input_dfwith columns id (sample id), annotator (id of annotator) and labels(containing lists of labels). Plus, it assumes a _convert_to_nltk_format() function that brings the data in a format like in data above.

Is this implementation correct?

from nltk.metrics.agreement import AnnotationTask
from nltk.metrics import masi_distance, binary_distance
from scipy.stats import bootstrap
import numpy as np

if input_df['labels'].apply(lambda x: len(x) > 1).any():
    distance = masi_distance
else:
    distance = binary_distance
        
# Define callable function for bootstrap
def compute_alpha_for_sample(sample_indices):
    """ Calculate Krippendorff's alpha for a sample of the data."""
    resampled_df = input_df.iloc[sample_indices]
    nltk_resampled_input = _convert_to_nltk_format(
        resampled_df, 'id', 'annotator', 'labels')
    resampled_task = AnnotationTask(data=nltk_resampled_input, distance=distance)
    return resampled_task.alpha()

# Get indices
data_array = np.arange(len(input_df))

# Perform bootstrap, resample with replacement, calculating alpha for each sample,
# chosing percentile method for confidence interval (Hugh 2021)
bootstrap_result = bootstrap(
    (data_array,),
    compute_alpha_for_sample,
    vectorized=False,
    n_resamples=1000,
    confidence_level=0.95,
    method='percentile'
)
lower_bound, upper_bound = bootstrap_result.confidence_interval

Calculate krippendorff's alpha with confidence interval for multilabel annotations in Python

Multilabel/Single Label Krippendorff's alpha with NLTK

Calculate the Confidence Interval

Answers (0)

Related Questions

Calculate krippendorff&#39;s alpha with confidence interval for multilabel annotations in Python

Multilabel/Single Label Krippendorff's alpha with NLTK

Calculate the Confidence Interval

Answers (0)

Related Questions

Calculate krippendorff's alpha with confidence interval for multilabel annotations in Python