Reputation: 43
I have trouble solving the following problem concerning the (simplified by limiting number of columns) data frame 'annotations' below.
require(irr)
# data
annotations <- read.table(text = "Obj1 Obj2 Obj3
Rater1 a b c
Rater2 a b b
Rater3 a b c", header = TRUE, stringsAsFactors = FALSE)
I would like to apply the function agree from the irr package to all combinations (not permutations) of rows, resulting in the following.
Agreement rater 1-2: 67%
Agreement rater 1-3: 100%
Agreement rater 2-3: 67%
I need to run a function on all combinations of rows and the function would need to access a number of/all columns.
I have worked out parts of the answer to the problem; I have generated a list of combinations running combn(rownames(annotations), 2)
, but I don't see how to use this list without writing inefficient for loops.
I have tried apply, as in apply(annotations, 1, agree)
, but I can only get this to work on one row, not the combinations mentioned before.
Does anyone have an idea how to proceed?
UPDATE: The following solution, based on your suggestions, works. (I have used kappa2
from the irr package instead of agree
, but the solution to the main question remains the same.)
require(irr) #require the irr library for agreement calculations
annotations <- read.table(text = "Obj1 Obj2 Obj3
Rater1 a b c
Rater2 a b b
Rater3 a b c
Rater4 c a a", header = TRUE, stringsAsFactors = FALSE)
annotations <- t(annotations) #transpose annotations (rows become columns and vice versa)
kappa_list <- combn(colnames(annotations), 2, FUN=function(x) kappa_list[[length(kappa_list)+1]] = kappa2(matrix(c(annotations[,x[1]], annotations[,x[2]]), ncol=2))$value) #fill kappa_list with all pairs of columns (combinations of 2 raters) in annotations and, per combination, add a value to kappa_list that consists of the value of kappa2 applied to the current combination of raters
kappa_list # display the list of values
Upvotes: 3
Views: 2589
Reputation: 8863
A fast approach is to make two vectors for the sequence of row numbers for each combination of rows, to make one matrix for the rows on one side and another matrix for the rows on the other side, and to then apply a vectorized function to the matrices:
es=1:3
r=sapply(es,function(e){
nrow=10^e
ncol=8
m=matrix(rnorm(ncol*nrow),nrow)
b=microbenchmark(times=100,
vectorized={
z=1:nrow
i1=rep(z[-nrow],rev(z[-nrow]))
i2=unlist(lapply(z[-1],function(x)x:nrow))
o=m[i1,]+m[i2,]
},
vectorized_without_saving_sequence_in_variable={
i1=rep(1:(nrow-1),(nrow-1):1)
i2=unlist(lapply(2:nrow,function(x)x:nrow))
o=m[i1,]+m[i2,]
},
vectorized_combn={
c=combn(nrow,2)
o=m[c[1,],]+m[c[2,],]
},
vectorized_for_loops={
ncomb=nrow*(nrow-1)/2
i1=integer(ncomb)
i2=integer(ncomb)
n=1;for(i in 1:(nrow-1))for(j in(i+1):nrow){i1[n]=i;i2[n]=j;n=n+1}
o=m[i1,]+m[i2,]
},
for_loops={
o=matrix(nrow=nrow*(nrow-1)/2,ncol=ncol)
n=1;for(i in 1:(nrow-1))for(j in(i+1):nrow){o[n,]=m[i,]+m[j,];n=n+1}
},
combn_with_function_argument={o=t(combn(nrow,2,function(x)m[x[1],]+m[x[2],]))},
combn_with_apply={o=t(apply(combn(nrow,2),2,function(x)m[x[1],]+m[x[2],]))}
)
a=aggregate(b$time,list(b$expr),median)
setNames(a[,2],gsub(" ","",a[,1]))/1e6
})
r2=apply(r,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x,na.rm=T)))),format="f"))
r3=apply(rbind(paste0("1e",es),r2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(r3,c("",rownames(r))),1,paste,collapse=" "))
Median time in ms for input with 10, 100, and 1000 rows:
1e1 1e2 1e3
0.022 0.58 31 vectorized
0.017 0.58 34 vectorized_without_saving_sequence_in_variable
0.052 3.41 331 vectorized_combn
0.068 7.12 718 vectorized_for_loops
0.095 10.24 1058 for_loops
0.126 11.84 1194 combn_with_function_argument
0.182 14.99 1636 combn_with_apply
The following is a faster alternative to c=combn(nrow,2);i1=c[1,];i2=c[2,]
(see How to efficiently generate lower triangle indices of a symmetric matrix):
z=1:nrow
i1=rep(z[-nrow],rev(z[-nrow]))
i2=unlist(lapply(z[-1],function(x)x:nrow))
Or this is only slightly slower but easier to understand:
i1=rep(1:(nrow-1),(nrow-1):1)
i2=unlist(lapply(2:nrow,function(x)x:nrow))
Upvotes: 0
Reputation: 1609
Applying the function f(x):= 2x+5
to all entries of a column corresponding to combinations. Instead of f(x):= 2x+5
, one can write his/her own function:
Step1: Design specific combinations dataframe. (The following was for my own case)
causalitycombinations <- function (nvars, ncausers, ndependents)
{
independents <- combn(nvars, ncausers)
swingnumber <- dim(combn(nvars - ncausers, ndependents))[[2]]
numberofallcombinations <- dim(combn(nvars, ncausers))[[2]] * swingnumber
dependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = ndependents)
for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])) {
dependents[(swingnumber * (i - 1) + 1):(swingnumber * i), ] <- t(combn(setdiff(seq(1:nvars), independents[, i]), ndependents))
}
swingedindependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = ncausers)
for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])) {
for (j in as.integer(1:swingnumber)) {
swingedindependents[(i - 1) * swingnumber + j, ] <- independents[, i]
}
}
independentsdependents <- cbind(swingedindependents, dependents)
others <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = nvars - ncausers - ndependents)
for (i in as.integer(1:((dim(combn(nvars, ncausers))[[2]]) * swingnumber))) {
others[i, ] <- setdiff(seq(1:nvars), independentsdependents[i, ])
}
causalitiestemplate <- cbind(independentsdependents, others)
causalitiestemplate
}
causalitycombinations(3,1,1)
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 1 3 2
#[3,] 2 1 3
#[4,] 2 3 1
#[5,] 3 1 2
#[6,] 3 2 1
Step2: Append the data to the combinations
(one can append multiple columns, I added only 1 for simplicity)
set.seed(1)
mydataframer <- cbind(causalitycombinations(3,1,1), rnorm(6))
mydataframer
# [,1] [,2] [,3] [,4]
#[1,] 1 2 3 -0.6264538
#[2,] 1 3 2 0.1836433
#[3,] 2 1 3 -0.8356286
#[4,] 2 3 1 1.5952808
#[5,] 3 1 2 0.3295078
#[6,] 3 2 1 -0.8204684
Step3: Apply the function via lapply
while taking into account number of rows of the composite dataframe
lapply(1: dim(mydataframer)[[1]], function(x) {2*mydataframer[x,4] + 5})
# 3.747092
# 5.367287
# 3.328743
# 8.190562
# 5.659016
# 3.359063
That is it.
By the way, ?irr::agree
help file states that nxm
ratings matrix/dataframe is "n subjects, m raters". Hence, questioner may design this better via:
annotations <- read.table(text = "Rater1 Rater2 Rater3
Subject1 a b c
Subject2 a b b
Subject3 a b c", header = TRUE, stringsAsFactors = FALSE)
annotations
# Rater1 Rater2 Rater3
# Subject1 a b c
# Subject2 a b b
# Subject3 a b c
Also, one thing still needs to be clarified whether questioner want to loop over all such combinations of annotations. If that's the case, i.e.,
annotations
# Rater1 Rater2 Rater3
# Subject1 a a a
# Subject2 a a a
# Subject3 a a a
annotations
# Rater1 Rater2 Rater3
# Subject1 a a b
# Subject2 a a a
# Subject3 a a a
annotations
# Rater1 Rater2 Rater3
# Subject1 a a c
# Subject2 a a a
# Subject3 a a a
annotations
# Rater1 Rater2 Rater3
# Subject1 a b a
# Subject2 a a a
# Subject3 a a a
# .... after consuming all Subject1 possibilities, this time consuming Subject2 possibilities,
annotations
# Rater1 Rater2 Rater3
# Subject1 a a a
# Subject2 a a b
# Subject3 a a a
and then Subject3 possibilities, and thereby collecting all the possibilities of agreements, then the problem changes completely.
The irr::agree
function designed for multiple rows. Observe from its help file:
data(video)
video
# rater1 rater2 rater3 rater4
# 1 4 4 3 4
# 2 4 4 4 5
# ..............................
# 20 4 5 5 4
agree(video) # Simple percentage agreement
# Percentage agreement (Tolerance=0)
# Subjects = 20; Raters = 4; %-agree = 35
agree(video, 1) # Extended percentage agreement
# Percentage agreement (Tolerance=1)
# Subjects = 20; Raters = 4; %-agree = 90
whereas in the case where the questioner wanna apply row-wise agrees (only 1 subject!), the %-agree is always 0:
agree(video[1,])
# Percentage agreement (Tolerance=0)
# Subjects = 1; Raters = 4; %-agree = 0
...
agree(video[20,])
# Percentage agreement (Tolerance=0)
# Subjects = 1; Raters = 4; %-agree = 0
Upvotes: 0
Reputation: 9570
You are close, you just need to apply
on the result of combn
instead. I have no idea what function you are referring to, but this should work the same if you plug in your function.
First, save the results as a list instead, because it is easier to add names (which I am adding my combining the two entries together):
toCheck <- combn(rownames(annotations), 2, simplify = FALSE)
names(toCheck) <-
sapply(toCheck, paste, collapse = " - ")
Then, use sapply
to work through your combinations. Here, I am using mean
to do the comparison, but use what you need here. If you are returning more than a single value, use lapply
then work with the result to print as desired
sapply(toCheck, function(x){
mean(annotations[x[1], ] == annotations[x[2], ])
})
Which returns:
Rater 1 - Rater 2 Rater 1 - Rater 3 Rater 2 - Rater 3
0.6666667 1.0000000 0.6666667
Upvotes: 6