Reputation: 3396
I am trying to get a frequency table from this dataframe:
tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L),
a3 = c(0L, 1L, 0L), b1 = c(1L, 0L, 1L),
b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 1L)),
.Names = c("a1", "a2", "a3", "b1", "b2", "b3"),
class = "data.frame", row.names = c(NA, -3L))
tmp2 <- read.csv("tmp2.csv", sep=";")
tmp2
> tmp2
a1 a2 a3 b1 b2 b3
1 1 1 0 1 1 0
2 0 0 1 0 0 1
3 0 1 0 1 0 1
I try to get a frequency table as follow:
table(tmp2[,1:3], tmp2[,4:6])
But I get :
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Expected output:
Info: It is not necessary a square matrix for instance I should be able to add b4 b5 and keep a1 a2 a3
Upvotes: 5
Views: 187
Reputation: 24510
An option:
matrix(colSums(tmp2[,rep(1:3,3)] & tmp2[,rep(4:6,each=3)]),
ncol=3,nrow=3,
dimnames=list(colnames(tmp2)[1:3],colnames(tmp2)[4:6]))
# b1 b2 b3
#a1 1 1 0
#a2 2 1 1
#a3 0 0 1
If you have a different number of a
and b
columns, you can try:
acols<-1:3 #state the indices of the a columns
bcols<-4:6 #same for b; if you add a column this should be 4:7
matrix(colSums(tmp2[,rep(acols,length(bcols))] & tmp2[,rep(bcols,each=length(acols))]),
ncol=length(bcols),nrow=length(acols),
dimnames=list(colnames(tmp2)[acols],colnames(tmp2)[bcols]))
Upvotes: 5
Reputation: 57220
Here's a possible solution :
aIdxs <- 1:3
bIdxs <- 4:7
# init matrix
m <- matrix(0,
nrow = length(aIdxs), ncol=length(bIdxs),
dimnames = list(colnames(tmp2)[aIdxs],colnames(tmp2)[bIdxs]))
# create all combinations of a's and b's column indexes
idxs <- expand.grid(aIdxs,bIdxs)
# for each line and for each combination we add 1
# to the matrix if both a and b column are 1
for(r in 1:nrow(tmp2)){
m <- m + matrix(apply(idxs,1,function(x){ all(tmp2[r,x]==1) }),
nrow=length(aIdxs), byrow=FALSE)
}
> m
b1 b2 b3
a1 1 1 0
a2 2 1 1
a3 0 0 1
Upvotes: 1
Reputation: 2366
An another possible solution here. Your input is a bit tricky for 'table', as you inherently have two sets 'a' and 'b' with binary indicators in each row indicating pairwise instances only between 'a' and 'b', and you want to loop over them. Below is a generalized (but maybe not so elegant) function that would work with different length 'a's and 'b's:
tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L), a3 = c(0L,
1L, 0L), b1 = c(1L, 0L, 1L), b2 = c(1L, 0L, 0L), b3 = c(0L, 1L,
1L)), .Names = c("a1", "a2", "a3", "b1", "b2", "b3"), class = "data.frame", row.names = c(NA,
-3L))
fun = function(x) t(do.call("cbind", lapply(x[,grep("a", colnames(x))],
function(p) rowSums(do.call("rbind", lapply(x[,grep("b", colnames(x))],
function(q) q*p ))))))
fun(tmp2)
#> fun(tmp2)
# b1 b2 b3
#a1 1 1 0
#a2 2 1 1
#a3 0 0 1
# let's do a bigger example
set.seed(1)
m = matrix(rbinom(size=1, n=50, prob=0.75), ncol=10, dimnames=list(paste("instance_", 1:5, sep=""), c(paste("a",1:4,sep=""), paste("b",1:6,sep=""))))
# Notice that the count of possible a and b elements are not equal
#> m
# a1 a2 a3 a4 b1 b2 b3 b4 b5 b6
#instance_1 1 0 1 1 0 1 1 1 0 0
#instance_2 1 0 1 1 1 1 1 0 1 1
#instance_3 1 1 1 0 1 1 1 1 0 1
#instance_4 0 1 1 1 1 0 1 1 1 1
#instance_5 1 1 0 0 1 1 0 1 1 1
fun(as.data.frame(m))
#> fun(as.data.frame(m))
# b1 b2 b3 b4 b5 b6
#a1 3 4 3 3 2 3
#a2 3 2 2 3 2 3
#a3 3 3 4 3 2 3
#a4 2 2 3 2 2 2
Upvotes: 0