Reputation: 4227
Here is my small dataset and here is a function:
dat <- data.frame (
A1 = c("AA", "AA", "AA", "AA"),
B1 = c("BB", "BB", "AB", "AB"),
C1 = c("AB", "BB", "AA", "AB"))
The function
syfun <- function (x, y){
if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
sxy = 1
}
if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
sxy = 0.5
}
if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
sxy = 0
}
return(sxy)
}
out <- rep (NA, NROW(dat))
for (i in 1:NROW(dat)){
out[i] <- syfun (dat[i,1], dat[i,1])
}
mean(out)
1
Here what I am trying to do is apply the function with first column (variable A) with same variable (variable A1) and average the output value. I want to save this output to a cell of matrix.
Similarly between A1 and B1.
for (i in 1:NROW(dat)){
out[i] <- syfun (dat[i,1], dat[i,2])
}
mean(out)
0.25
Now similar to correlation matrix, I want to save all possible combination between variable to make a matrix like.
A1 B1 C1
A1 1.0 0.25 0.5
B1 0.25 1.0 NA
C1 0.5 NA 1.0
Edits: More complete function that do not produce NAs
syfun <- function (x, y){
sxy <- NA
if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
sxy = 1
}
if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
sxy = 0.5
}
if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
sxy = 0
}
if (x == "BB" & y == "AB"| x == "AB" & y == "BB"){
sxy = 0.5
}
if(x == "AB" & y == "AB") {
sxy = 0.5
}
return(sxy)
}
Upvotes: 4
Views: 389
Reputation: 81693
First, your function syfun
has to return NA
if there is no match. Hence, I added a line at the top of the function:
syfun <- function (x, y){
sxy <- NA
if(x == "AA" & y == "AA" | x == "BB" & y == "AA"){
sxy = 1
}
if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
sxy = 0.5
}
if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
sxy = 0
}
return(sxy)
}
Second, you can use outer
to apply the function to all combinations. You need to use Vectorize
to vectorize the function:
mat <- outer(names(dat), names(dat), function(x, y)
Vectorize(function(a, b) mean(Vectorize(syfun)(dat[[a]], dat[[b]])))(x,y))
Third, replace the elements on the diagonal with 1
:
diag(mat) <- 1
Fourth, set row and column names:
dimnames(mat) <- list(names(dat), names(dat))
The result:
A1 B1 C1
A1 1.00 0.25 0.5
B1 0.25 1.00 NA
C1 0.50 NA 1.0
Upvotes: 3
Reputation: 3110
From your example it looks like you just want to know the proportion of As in one compared with the As in another to compute their similarity. If that is the case then: (I'm assuming these are genes?)
dat <- data.frame (
A1 = c("AA", "AA", "AA", "AA"),
B1 = c("BB", "BB", "AB", "AB"),
C1 = c("AB", "BB", "AA", "AB"))
## this function takes the columns from dat, pastes all the genes together, then counts the number of each that appears. It then divides the smaller by the larger to give you a percent similar (only does it for 'A' right now, but I could expand that to more genes if necessary)
fun <- function(x,y){
x.prop <- table(unlist(strsplit(Reduce(paste0, x),'*')))
y.prop <- table(unlist(strsplit(Reduce(paste0, y),'*')))
ans <- ifelse(x.prop['A']>y.prop['A'], y.prop['A']/x.prop['A'], x.prop['A']/y.prop['A'])
return(ans)
}
final_mat <- matrix(ncol=3,nrow=3) ## creates an empty final matrix
colnames(final_mat) <- colnames(dat)
rownames(final_mat) <- colnames(dat)
### this applies 'fun' to each of the 2 combinations of column names
final_mat[upper.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))
final_mat[lower.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))
diag(final_mat) <- 1
final_mat
A1 B1 C1
A1 1.00 0.25 0.5
B1 0.25 1.00 0.5
C1 0.50 0.50 1.0
Upvotes: 2