Reputation: 21
I'm wondering how to correctly calculate Krippendorff's alpha with confidence intervals in a multilabel scenario and coding in Python.
NLTK provides an implementation of Krippendorff's alpha: nltk.metrics.agreement (s. Source Code ). It works for a single label and mutlilabel situation, specifying the correct distance function (Masi Distance for multilabel and binary distance for single label):
from nltk.metrics.agreement import AnnotationTask
from nltk.metrics import masi_distance, binary_distance
data = [
('c1', '1', frozenset({'v1', 'v2'})), ('c2', '1', frozenset({'v1'})),
('c1', '2', frozenset({'v2'})), ('c2', '2', frozenset({'v2', 'v3'})),
('c1', '3', frozenset()), ('c2', '3', frozenset({'v1'})),
('c1', '4', frozenset({'v3'})), ('c2', '4', frozenset({'v3', 'v2'}))
]
distance_func = masi_distance if multilabel else binary
task = AnnotationTask(data=data, distance=distance)
alpha = task.alpha()
Now, for the confidence interval, I've consulted several source:
There seems to be two popular options on how to calculate confidence intervals:
What can you recommend? And how would a correct implementation look like?
I've tried implementing the bootstrapping method looking like the following, note: It assumes a pandas dataframe input_df
with columns id
(sample id), annotator
(id of annotator) and labels
(containing lists of labels).
Plus, it assumes a _convert_to_nltk_format()
function that brings the data in a format like in data
above.
Is this implementation correct?
from nltk.metrics.agreement import AnnotationTask
from nltk.metrics import masi_distance, binary_distance
from scipy.stats import bootstrap
import numpy as np
if input_df['labels'].apply(lambda x: len(x) > 1).any():
distance = masi_distance
else:
distance = binary_distance
# Define callable function for bootstrap
def compute_alpha_for_sample(sample_indices):
""" Calculate Krippendorff's alpha for a sample of the data."""
resampled_df = input_df.iloc[sample_indices]
nltk_resampled_input = _convert_to_nltk_format(
resampled_df, 'id', 'annotator', 'labels')
resampled_task = AnnotationTask(data=nltk_resampled_input, distance=distance)
return resampled_task.alpha()
# Get indices
data_array = np.arange(len(input_df))
# Perform bootstrap, resample with replacement, calculating alpha for each sample,
# chosing percentile method for confidence interval (Hugh 2021)
bootstrap_result = bootstrap(
(data_array,),
compute_alpha_for_sample,
vectorized=False,
n_resamples=1000,
confidence_level=0.95,
method='percentile'
)
lower_bound, upper_bound = bootstrap_result.confidence_interval
Upvotes: 0
Views: 229