A. Donda
A. Donda

Reputation: 8467

detect exact blocks in matrix (two-dimensional array)

I'm looking for an efficient algorithm to identify a block structure in a matrix with many 0 entries.

For example, the 6×7 matrix

0.0975    0.9575         0         0         0         0         0
0.2785    0.9649         0         0         0         0         0
0.5469    0.1576         0         0         0         0         0
     0         0    0.9706    0.9572         0         0         0
     0         0         0         0    0.8235    0.3171    0.0344
     0         0         0         0    0.6948    0.9502    0.4387

consists of three blocks of sizes 3×2, 1×2, and 2×3, respectively.

A block is defined by a set of rows and a set of columns. A block structure is characterized by the fact that all entries that do not belong to a block are 0 exactly. However, there may be exact-0 entries also within the blocks.

A trivial solution is to always declare the whole matrix a block; therefore, a solution is sought such that the number of within-block entries is as small as possible.

To make things harder (or maybe easier?), the blocks do not have to be contiguous. A permuted version of the above matrix,

     0    0.9572         0         0         0         0    0.9706
     0         0    0.0975         0         0    0.9575         0
0.4387         0         0    0.9502    0.6948         0         0
0.0344         0         0    0.3171    0.8235         0         0
     0         0    0.2785         0         0    0.9649         0
     0         0    0.5469         0         0    0.1576         0

therefore also has a three-block structure, which can be described as:

Solutions I have thought of are:

I'm looking for an algorithm, or other ideas for an algorithm, not for an implementation. However, an implementation e.g. in Matlab or Python would be welcome.

Upvotes: 2

Views: 323

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

This is a standard scenario in general expression analysis.

The algorithms for this are known as biclustering (because they cluster rows and columns at the same time). An early method is due to Cheng and Church.

Upvotes: 2

Related Questions