Bi clustering categorical data by two variables

Question

I have a table of categorical values I would like to cluster both by the rows, and by the columns.

Example data: test_dataset.csv

I,II,III,IV,V
A,0,3,3,2,3
B,0,3,3,0,0
C,0,0,3,3,3
D,0,3,1,3,0
E,0,0,3,0,0

The levels are "no data", "no increase", "mixed", and "increase".

I found an R package blockcluster that in theory should be able to do this.

#install.packages("blockcluster")
library(blockcluster)
#0 = no data, 1 = no increase, 2 = mixed, 3 = increase
dataset<-read.table("test_dataset.csv",header = T,  sep=',')
out<-coclusterCategorical(as.matrix(dataset),nbcocluster = c(3,2))
summary(out)
plot(out)

This is the resulting plot:

I would like to ask some help regarding how to interpret this plot, if someone has worked with this package before - how do I know which row/column represents what in the co-clustered data?

If I am not mistaken the nbcocluster parameter determines the resulting clusters row and column wise - how do I know beforehand what is the appropriate amount of clusters?

Is it appropriate to do categorical clustering if one of the categories is essentially missing data?

I am open to suggestions to other methods that can bicluster categorical data. I appreciate any and all help, I have never done this before.

Bi clustering categorical data by two variables

Answers (1)

Related Questions