nantoku
nantoku

Reputation: 59

Merge data frames for Cohen's kappa

I'm trying to analyze some date using R but I'm not very familiar with R (yet) and therefore I'm totally stuck.

What I try to do is manipulate my input data so I can use it to calculate Cohen's Kappa. Now the problem is, that for rater_1, I have several ratings for some of the items and I need to select one. If rater_1 has given the same rate on an item as rater_2, then this rating should be chosen, if not any rating of the list can be used.

I tried

unique(merge(rater_1, rater_2, all.x=TRUE))

which brings me close, but if the ratings between the two raters diverge, only one is kept.

So, my question is, how do I get from

item rating_1
1    3
2    5
3    4 

item rating_2
1    2
1    3
2    4
2    1
2    2
3    4 
3    2

to

item rating_1 rating_2
1    3         3
2    5         4
3    4         4

?

Upvotes: 1

Views: 230

Answers (1)

nograpes
nograpes

Reputation: 18323

There are some fancy ways to do this, but I thought it might be helpful to combine a few basic techniques to accomplish this task. Usually, in your question, you should include some easy way to generate your data, like this:

# Create some sample data
set.seed(1)
id<-rep(1:50)
rater_1<-sample(1:5,50,replace=TRUE)
df1<-data.frame(id,rater_1)

id<-rep(1:50,each=2)
rater_2<-sample(1:5,100,replace=TRUE)
df2<-data.frame(id,rater_2)

Now, here is one simple technique for doing this.

# Merge together the data frames.
all.merged<-merge(df1,df2)
#   id rater_1 rater_2
# 1  1       2       3
# 2  1       2       5
# 3  2       2       3
# 4  2       2       2
# 5  3       3       1
# 6  3       3       1

# Find the ones that are equal.
same.rating<-all.merged[all.merged$rater_2==all.merged$rater_1,]
# Consider id 44, sometimes they match twice.
# So remove duplicates.
same.rating<-same.rating[!duplicated(same.rating),]
# Find the ones that never matched.
not.same.rating<-all.merged[!(all.merged$id %in% same.rating$id),]
# Pick one. I chose to pick the maximum.
picked.rating<-aggregate(rater_2~id+rater_1,not.same.rating,max)
# Stick the two together.
result<-rbind(same.rating,picked.rating)
result<-result[order(result$id),] # Sort

#     id rater_1 rater_2
# 27   1       2       5
# 4    2       2       2
# 33   3       3       1
# 44   4       5       3
# 281  5       2       4
# 11   6       5       5

A fancy way to do this would be like this:

same.or.random<-function(x) {
  matched<-which.min(x$rater_1==x$rater_2)
  if(length(matched)>0) x[matched,]
  else x[sample(1:nrow(x),1),]
}
do.call(rbind,by(merge(df1,df2),id,same.or.random))

Upvotes: 0

Related Questions