S12000
S12000

Reputation: 3396

Table frequency from multiple col and multiple row in R

I am trying to get a frequency table from this dataframe:

tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L),
                       a3 = c(0L, 1L, 0L), b1 = c(1L, 0L, 1L),
                       b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 1L)),
                       .Names = c("a1", "a2", "a3", "b1", "b2", "b3"),
                       class = "data.frame", row.names = c(NA, -3L))


tmp2 <- read.csv("tmp2.csv", sep=";")
tmp2
> tmp2
  a1 a2 a3 b1 b2 b3
1  1  1  0  1  1  0
2  0  0  1  0  0  1
3  0  1  0  1  0  1

I try to get a frequency table as follow:

table(tmp2[,1:3], tmp2[,4:6])

But I get :

Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

Expected output:

enter image description here

Info: It is not necessary a square matrix for instance I should be able to add b4 b5 and keep a1 a2 a3

Upvotes: 5

Views: 187

Answers (3)

nicola
nicola

Reputation: 24510

An option:

matrix(colSums(tmp2[,rep(1:3,3)] & tmp2[,rep(4:6,each=3)]),
       ncol=3,nrow=3,
       dimnames=list(colnames(tmp2)[1:3],colnames(tmp2)[4:6]))
#   b1 b2 b3
#a1  1  1  0
#a2  2  1  1
#a3  0  0  1

If you have a different number of a and b columns, you can try:

acols<-1:3 #state the indices of the a columns
bcols<-4:6 #same for b; if you add a column this should be 4:7
matrix(colSums(tmp2[,rep(acols,length(bcols))] & tmp2[,rep(bcols,each=length(acols))]),
           ncol=length(bcols),nrow=length(acols),
           dimnames=list(colnames(tmp2)[acols],colnames(tmp2)[bcols]))

Upvotes: 5

digEmAll
digEmAll

Reputation: 57220

Here's a possible solution :

aIdxs <- 1:3
bIdxs <- 4:7

# init matrix
m <- matrix(0,
            nrow = length(aIdxs), ncol=length(bIdxs),
            dimnames = list(colnames(tmp2)[aIdxs],colnames(tmp2)[bIdxs]))

# create all combinations of a's and b's column indexes
idxs <- expand.grid(aIdxs,bIdxs)

# for each line and for each combination we add 1
# to the matrix if both a and b column are 1 
for(r in 1:nrow(tmp2)){
  m <- m + matrix(apply(idxs,1,function(x){ all(tmp2[r,x]==1) }),
                  nrow=length(aIdxs), byrow=FALSE)
}
> m
   b1 b2 b3
a1  1  1  0
a2  2  1  1
a3  0  0  1

Upvotes: 1

Teemu Daniel Laajala
Teemu Daniel Laajala

Reputation: 2366

An another possible solution here. Your input is a bit tricky for 'table', as you inherently have two sets 'a' and 'b' with binary indicators in each row indicating pairwise instances only between 'a' and 'b', and you want to loop over them. Below is a generalized (but maybe not so elegant) function that would work with different length 'a's and 'b's:

tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L), a3 = c(0L, 
                                                              1L, 0L), b1 = c(1L, 0L, 1L), b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 
                                                                                                                      1L)), .Names = c("a1", "a2", "a3", "b1", "b2", "b3"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                -3L))                                                                                                                                                                                                               
fun = function(x) t(do.call("cbind", lapply(x[,grep("a", colnames(x))], 
    function(p) rowSums(do.call("rbind", lapply(x[,grep("b", colnames(x))], 
    function(q) q*p ))))))
fun(tmp2)
#> fun(tmp2)
#   b1 b2 b3
#a1  1  1  0
#a2  2  1  1
#a3  0  0  1

# let's do a bigger example
set.seed(1)
m = matrix(rbinom(size=1, n=50, prob=0.75), ncol=10, dimnames=list(paste("instance_", 1:5, sep=""), c(paste("a",1:4,sep=""), paste("b",1:6,sep=""))))

# Notice that the count of possible a and b elements are not equal
#> m
#           a1 a2 a3 a4 b1 b2 b3 b4 b5 b6
#instance_1  1  0  1  1  0  1  1  1  0  0
#instance_2  1  0  1  1  1  1  1  0  1  1
#instance_3  1  1  1  0  1  1  1  1  0  1
#instance_4  0  1  1  1  1  0  1  1  1  1
#instance_5  1  1  0  0  1  1  0  1  1  1

fun(as.data.frame(m))
#> fun(as.data.frame(m))
#   b1 b2 b3 b4 b5 b6
#a1  3  4  3  3  2  3
#a2  3  2  2  3  2  3
#a3  3  3  4  3  2  3
#a4  2  2  3  2  2  2

Upvotes: 0

Related Questions