enzom83
enzom83

Reputation: 8320

Is there a way to identify regions that are not very similar from a set of images?

Given an image, I would like to extract more subimages from it, but the resulting subimages must not be overly similar to each other. If the center of each ROI should be chosen randomly, then we must make sure that each subimage has at most only a small percentage of area in common with other subimages. Or we could decompose the image into small regions over a regular grid, then I randomly choose a subimage within each region. This option, however, does not ensure that all subimages are sufficiently different from each other. Obviously I have to choose a good way to compare the resulting subimages, but also a similarity threshold.

The above procedure must be performed on many images: all the extracted subimages should not be too similar. Is there a way to identify regions that are not very similar from a set of images (for eg by inspecting all histograms)?

Upvotes: 1

Views: 249

Answers (2)

mmgp
mmgp

Reputation: 19241

One possible way is to split your image into n x n squares (save edge cases) as you pointed out, reduce each of them to a single value and group them according to k-nearest values (pertaining to the other pieces). After you group them, then you can select, for example, one image from each group. Something that is potentially better is to use a more relevant metric inside each group, see Comparing image in url to image in filesystem in python for two such metrics. By using this metric, you can select more than one piece from each group.

Here is an example using some duck I found around. It considers n = 128. To reduce each piece to a single number, it calculates the euclidean distance to a pure black piece of n x n.

f = Import["http://fohn.net/duck-pictures-facts/mallard-duck.jpg"];
pieces = Flatten[ImagePartition[ColorConvert[f, "Grayscale"], 128]]

enter image description here

black = Image[ConstantArray[0, {128, 128}]];
dist = Map[ImageDistance[#, black, DistanceFunction -> EuclideanDistance] &,
            pieces];
nf = Nearest[dist -> pieces];

Then we can see the grouping by considering k = 2:

GraphPlot[
 Flatten[Table[
   Thread[pieces[[i]] -> nf[dist[[i]], 2]], {i, Length[pieces]}]],
 VertexRenderingFunction -> (Inset[#2, #, Center, .4] &), 
 SelfLoopStyle -> None]

enter image description here

Now you could use a metric (better than the distance to black) inside each of these groups to select the pieces you want from there.

Upvotes: 2

s.bandara
s.bandara

Reputation: 5664

Since you would like to apply this to a large number of images, and you already suggested it, let's discuss how to solve this problem by selecting different tiles.

The first step could be to define what "similar" is, so a similarity metric is needed. You already mentioned the tiles' histogram as one source of metric, but there may be many more, for example:

  • mean intensity,
  • 90th percentile of intensity,
  • 10th percentile of intensity,
  • mode of intensity, as in peak of the histogram,
  • variance of pixel intensity in the whole tile,
  • granularity, which you could quickly approximate by the difference between the raw and the Gaussian-filtered image, or by calculating the average variance in small sub-tiles.

If your image has two channels, the above list leaves you already with 12 metric components. Moreover, there are characteristics that you can obtain from the combination of channels, for example the correlation of pixel intensities between channels. With two channels that's only one characteristic, but with three channels it's already three.

To pick different tiles from this high-dimensional cloud, you could consider that some if not many of these metrics will be correlated, so a principal component analysis (PCA) would be a good first step. http://en.wikipedia.org/wiki/Principal_component_analysis

Then, depending on how many sample tiles you would like to chose, you could look at the projection. For seven tiles, for example, I would look at the first three principal components, and chose from the two extremes of each, and then also pick the one tile closest to the center (3 * 2 + 1 = 7).

If you are concerned that chosing from the very extremes of each principal component may not be robust, the 10th and 90th percentiles may be. Alternatively, you could use a clustering algorithm to find separated examples, but this would depend on how your cloud looks like. Good luck.

Upvotes: 2

Related Questions