Reputation: 53598
I've been trying to find something that automatically finds all shared regions between two images, explicitly not based on pixel-matching or differencing, and I'm basically coming up with nothing after a fair bit of searching.
Say I have the following two images, in this case, website screenshots. The first the "baseline":
and the second very similar but with some modified CSS so entire blocks have been moved around. No text content changes, no box dimension changes, just some elements repositioned:
In this case (but also in literally every other case where two images where one is a derivative of the other are to be compared) their pixel diff is effectively useless for seeing what changed:
In fact, even if we apply some simple diff exaggeration, the result is still fairly useless because we're still looking at pixel diffs, instead of diffs based on what changed, so we won't (in any way) be looking at the actual modifications to the visual information:
So this is like comparing two books and then deciding the books are different based on how many values for n
we can find for which book1.letters[n] != book2.letters[n]
...
So, what I'm looking for is a way to compute regions of similarity, showing which parts of the two images encode the same information, but not necessarily in the same bounding boxes.
For instance, in the above two images, almost all the data is the same, just with some parts relocated. The only true difference is that there's mystery whitespace.
With similar regions color coded:
and the correspondence:
I can't find a single tool to do this, and I can't even find tutorials that allow for implementation of this using opencv or similar technologies. Maybe I'm looking for the wrong terms, maybe no one actually ever wrote an image comparison tool for this (which seems beyond belief?), so at the risk of this being off topic: I searched and researched as much as I can, here. What are my options if I need this as a tool that can be run as part of a normal (open source) tool chain for QA/testing? (so: not some expensive plugin to equally expensive commercial software).
Upvotes: 3
Views: 3798
Reputation: 53598
To answer my own question: opencv (for python) paired with scikit-image can pretty much get us there in two steps.
In code, assuming two images imageA
and imageB
, with the same dimensions:
import cv2
import imutils
from skimage.metrics import structural_similarity
# ...a bunch of functions will be going here...
diffs = compare(imageA, imageB, gray(imageA), gray(imageB), [])
if len(diffs) > 0:
highlight_diffs(imageA, imageB, diffs)
else:
print("no differences detected")
with:
def gray(img):
return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
def compare(or1, or2, im1, img2, diffs):
(score, diff) = structural_similarity(im1, img2, full=True)
diff = (diff * 255).astype("uint8")
thresh = cv2.threshold(diff, 0, 255,
cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
contours = cv2.findContours(thresh.copy(),
cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = imutils.grab_contours(contours)
# aggregate the contours, throwing away duplicates
for c in contours:
(x, y, w, h) = cv2.boundingRect(c)
region = [x, y, x + w, y + h]
try:
diffs.index(region)
except ValueError:
diffs.append(region)
return diffs
Now, cv2.RETR_EXTERNAL
is supposed to only yield "External contours", e.g. if there's diffs inside other diffs (say a box's border color changed, and some text inside the box also changed), it should just yield one box, being the outer ("external") box.
Except that's not what it does, so I wrote a dumb function that naively weeds inner boxes:
def filter_diffs(diffs):
def not_contained(e, diffs):
for t in diffs:
if e[0] > t[0] and e[2] < t[2] and e[1] > t[1] and e[3] < t[3]:
return False
return True
return [e for e in diffs if not_contained(e, diffs)]
which then gets used in the function that highlights the differences using color rectangles.
RED = (0,0,255)
def highlight_diffs(a, b, diffs):
diffed = b.copy()
for area in filter_diffs(diffs):
x1, y1, x2, y2 = area
cv2.rectangle(diffed, (x1, y1), (x2, y2), RED, 2)
cv2.imshow("Diffed", diffed)
This gets us the first part. Taking a screenshot of Stackoverflow, and then another screenshot after moving the left advertisement down, and recoloring the --yellow-100
CSS variable:
This finds five diffs, but two of them aren't really "diffs" in the sense that it's new or removed content, but rather it's the result of "we moved a thing down".
So, let's add in the template matching:
def highlight_diffs(a, b, diffs):
diffed = b.copy()
for area in filter_diffs(diffs):
x1, y1, x2, y2 = area
# is this a relocation, or an addition/deletion?
org = find_in_original(a, b, area)
if org is not None:
cv2.rectangle(a, (org[0], org[1]), (org[2], org[3]), BLUE, 2)
cv2.rectangle(diffed, (x1, y1), (x2, y2), BLUE, 2)
else:
cv2.rectangle(diffed, (x1+2, y1+2), (x2-2, y2-2), GREEN, 1)
cv2.rectangle(diffed, (x1, y1), (x2, y2), RED, 2)
cv2.imshow("Original", a)
cv2.imshow("Diffed", diffed)
cv2.waitKey(0)
With the following code for the template matching, with an incredibly strict threshold for "is the match we found actually good":
def find_in_original(a, b, area):
crop = b[area[1]:area[3], area[0]:area[2]]
result = cv2.matchTemplate(crop, a, cv2.TM_CCOEFF_NORMED)
(minVal, maxVal, minLoc, maxLoc) = cv2.minMaxLoc(result)
(startX, startY) = maxLoc
endX = startX + (area[2] - area[0])
endY = startY + (area[3] - area[1])
ocrop = a[startY:endY, startX:endX]
# this basically needs to be a near-perfect match
# for us to consider it a "moved" region rather than
# a genuine difference between A and B.
if structural_similarity(gray(ocrop), gray(crop)) >= 0.99:
return [startX, startY, endX, endY]
We can now compare the original and modified image, and see that the ad got moved in the modified image, rather than being "new content", and we can see where it can be found in the original:
And that's it, we have a visual diff that actually tells us something useful about the changes, rather than telling us which pixel happens to be a different colour.
We could bring down the template matching threshold down a little to, say, 0.95, in which case the whitespace box would also end up matched to the original image, but because it's just whitespace it'll get matched to something mostly meaningless (in this particular case, it'll match it to the whitespace in the lower right of the original).
Of course, quality of life improvements would be to cycle through colours so that various moved parts can all be related to each other by their shared colour, but that's the kind of thing that anyone can probably tack on top of this code themselves.
Upvotes: 2
Reputation: 862
Unfortunately I couldn't produce the exact desired result but through a fairly blunt algorithm, I got somewhat close. The general algorithm is:
Add the same random noise to each image.
See the first and third pane in figure 1. Adding the same noise to both images ensures that featureless regions (e.g. white background) can be compared via a phase correlation (below).
Populate a matrix of zeros with a boxed subsection from image 1
An example of this matrix is given in the middle pane of figure 1. This image must be the same dimensions as image 1 and image 2.
There are several knobs that you can turn here that can improve the end result. See Performing a phase correlation with fft in R
These values indicate how the matrix from step 2 should be shifted in the x and y directions so that the it best matches the noisy image 2.
This is done by looping over the rows and columns in image 1. You can loop over every index or skip several indices.
Notice that the result is not exactly what you're looking for but its close. Basically the red regions indicate that the corresponding regions in image 1 don't have to be shifted. Yellow regions (in this case) need to be shifted slightly down, orange more so, and white need to be shifted the up.
Again, adding the same noise to image 1 and 2 is an important step. The algorithm depends on isolating small boxed regions (in the example code, I used 50x50 pixel boxes). As you loop through the rows and columns of image 1 and isolate corresponding boxed regions, several areas would contain regions with no features. This poses problems in the phase correlation as a boxed featureless region will have multiple high correlation values in all regions that have a similar featureless background. Effectively, adding noise adds features to both images in order to reduce ambiguous phase correlations.
The reasons this algorithm doesn't produce the same result as the desired is because the boxed regions aren't chosen in a clever way -- they are selected as you loop over the rows and columns of image 1. Therefore, depending on the box size you choose, some boxed regions will have features that are translated differently when compared to image 2. Perhaps this algorithm would work better after the region clustering algorithm proposed by yapws87
Here is the R code to produce these results:
## read in the images
img1 <- readJPEG('./img1.jpg')
img2 <- readJPEG('./img2.jpg')
## grayscale the images
img1 <- (img1[,,1]+img1[,,2]+img1[,,3])/3
img2 <- (img2[,,1]+img2[,,2]+img2[,,3])/3
## rotate the images for more intuitive R plotting
img1 <- t(apply(img1,2,rev))
img2 <- t(apply(img2,2,rev))
## create some uniform noise
noise <- matrix(runif(n=nrow(img1)*ncol(img1)),nrow=nrow(img1),ncol=ncol(img1))*0.1
## add the SAME noise to both images
img1 <- noise+img1
img2 <- noise+img2
## remove the mean from both images (this may not be necessary)
img1 <- img1/mean(img1)
img2 <- img2/mean(img2)
## Take the conjugate of the fft of the second image
IMG2c <- Conj(fft(img2))
## define how to loop through the first image
row.step=50
col.step=50
## create a zero image (made with all 0s)
zero.img <- matrix(0,ncol=ncol(img1),nrow=nrow(img1))
## initialize some vectors to hold the x and y
## shifts that correspond to the highest phase correlation value
shift.x.vec=NULL
shift.y.vec=NULL
## keep track of how many iterations you go through
i.iters=1
## loop over the columns
i=1
while((i+col.step-1)<nrow(img1)) {
## keep track of how many iterations you go through
j.iters=1
## loop over the rows
j=1
while((j+col.step-1)<ncol(img1)) {
## define a current 'box' as the zero image
cbox1 <- zero.img
## then populate a small box with values from image 1
cbox1[i:(i+row.step-1),j:(j+col.step-1)] <- img1[i:(i+row.step-1),j:(j+col.step-1)]
## PERFORM THE PHASE CORRELATION
## go into the frequency domain
CBOX1 <- fft(cbox1)
## find a normalized value
norm <- abs(CBOX1 * IMG2c)
## perform the phase correlation and go back to the space domain
corr <- Re(fft((CBOX1 * IMG2c)/norm,inv=TRUE)/length(CBOX1))
## this rearranges the quadrants of the matrix see
## matlabs function fftshift
corr <- fftshift(corr)
## find the x and y index values associated with the
## highest correlation value.
shift <- which(corr==max(corr),arr.ind=TRUE)
shift.x <- shift[1]
shift.y <- shift[2]
## populate the x and y shift vectors
shift.x.vec <- c(shift.x.vec,shift.x)
shift.y.vec <- c(shift.y.vec,shift.y)
## THIS IS ADDITIONAL PLOTTING AND CAN BE IGNORED
if(i.iters==6 & j.iters==6) {
dev.new()
##jpeg('./example.jpeg',width=900,height=700)
split.screen(c(1,3))
screen(1)
image(1:nrow(img1),1:ncol(img1),img1,col=gray.colors(200),axes=FALSE,ylab="",xlab="",useRaster=TRUE,main='Noisy Image 1')
rect(j,i,(j+col.step-1),(i+row.step-1))
screen(2)
image(cbox1,col=gray.colors(200),axes=FALSE,useRaster=TRUE,main='Current Box')
screen(3)
image(img2,col=gray.colors(200),axes=FALSE,useRaster=TRUE,main='Noisy Image 2')
##dev.off()
}
j.iters=j.iters+1
j=j+row.step
}
i.iters=i.iters+1
i=i+col.step
}
## make a matrix of shifts values
## in this example, only the y shifts are interesting though
shift.x.mat <- matrix(shift.x.vec,ncol=j.iters-1,nrow=i.iters-1,byrow=TRUE)
shift.y.mat <- matrix(shift.y.vec,ncol=j.iters-1,nrow=i.iters-1,byrow=TRUE)
##jpeg('./final.jpeg',width=800,height=800)
image(shift.y.mat,axes=FALSE,useRaster=TRUE)
##dev.off()
Upvotes: 0
Reputation: 1839
Here is a suggestion for the initial region clustering.
First we subtract the 2 images to find out region that is different. Then we resize it to smaller scale for faster speed and easier clustering.
Then we run morphological close operation to cluster all nearby objects together.
Threshold the result to obtain the strong signals
Run connected component analysis to get all the bounding boxes.
Then check for all the box intersections and unionize them. In my case, i just redrew all the bounding boxes in solid mode and re-analyze the component to obtain regions
Once we have this, we can run the same process on the second image and cross match each of the region extracted using a simple cross correlation matching method or any other fancy method for matching. In this case, a simple width and height matching of between regions will do as well.
Here is the code i made. I hope it helps.
import cv2
import numpy as np
# Function to fill all the bounding box
def fill_rects(image, stats):
for i,stat in enumerate(stats):
if i > 0:
p1 = (stat[0],stat[1])
p2 = (stat[0] + stat[2],stat[1] + stat[3])
cv2.rectangle(image,p1,p2,255,-1)
# Load image file
img1 = cv2.imread('img1.jpg',0)
img2 = cv2.imread('img2.jpg',0)
# Subtract the 2 image to get the difference region
img3 = cv2.subtract(img1,img2)
# Make it smaller to speed up everything and easier to cluster
small_img = cv2.resize(img3,(0,0),fx = 0.25, fy = 0.25)
# Morphological close process to cluster nearby objects
fat_img = cv2.dilate(small_img, None,iterations = 3)
fat_img = cv2.erode(fat_img, None,iterations = 3)
fat_img = cv2.dilate(fat_img, None,iterations = 3)
fat_img = cv2.erode(fat_img, None,iterations = 3)
# Threshold strong signals
_, bin_img = cv2.threshold(fat_img,20,255,cv2.THRESH_BINARY)
# Analyse connected components
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(bin_img)
# Cluster all the intersected bounding box together
rsmall, csmall = np.shape(small_img)
new_img1 = np.zeros((rsmall, csmall), dtype=np.uint8)
fill_rects(new_img1,stats)
# Analyse New connected components to get final regions
num_labels_new, labels_new, stats_new, centroids_new = cv2.connectedComponentsWithStats(new_img1)
labels_disp = np.uint8(200*labels/np.max(labels)) + 50
labels_disp2 = np.uint8(200*labels_new/np.max(labels_new)) + 50
cv2.imshow('diff',img3)
cv2.imshow('small_img',small_img)
cv2.imshow('fat_img',fat_img)
cv2.imshow('bin_img',bin_img)
cv2.imshow("labels",labels_disp)
cv2.imshow("labels_disp2",labels_disp2)
cv2.waitKey(0)
Upvotes: 3
Reputation:
Suggestion:
The problem can be much eased if you are able to segment out the blue sentences, which can be achieved by morphological dilation followed by binarization. If the dilation is strong enough that all characters come in contact (though distinct rows of text stay apart), connected components labeling can extract hole lines.
Now you have bounding boxes and the number of positions to be tried is much reduced.
Also look at the diff algorithm, which can be relevant for sequential text. https://en.wikipedia.org/wiki/Diff
Upvotes: 0