kyrenia
kyrenia

Reputation: 5575

Compare similarity of relatively dissimilar images [websites] with one another

I am looking to calculate some sort of '% similarity' between two screenshots of websites. I am specifically looking to calculate how a particular website changes over time (i.e. to determine which websites tend to keep a consistent look), and also to compare a website's appearance to a bunch of other websites of a similar class (e.g. all news sites) to see how distinctive it is from the others. [one of the applications i have in mind is analyzing the evolution of 'news' sites - many weblogs tend to be very similar looking as one, whereas some of the news sites are quite distinctive, and have changed a lot over time]

There are quite a few other stack-overflow on comparing similarity of images - however they tend to be focused on detecting identical/very similar images (e.g. Image comparison - fast algorithm or Image similarity comparison ). In comparison, I am looking to determine some sort of score between images which are quite difference. As such some methods such as hashing/keypoint matching are probably ruled-out (i.e. because two images which may not share any keypoints exactly may still appear quite similar, at least to the eye)

Note: My current brainstorming was to use a histogram of colors method, probably using relatively coarse buckets of colors since many colors are relatively indistinguishable (e.g. convert site to 256 color). Possibly also comparing whether a site tends to just be dominated by one color, or has a wide variety of colors.

Upvotes: 0

Views: 198

Answers (1)

aledalgrande
aledalgrande

Reputation: 5227

I would probably use a HOG (Histogram Of Gaussians) of the top of the page (fixed size). That would act like a huge "feature" for the website. Then you can calculate the HOG confidence between different samples.

Upvotes: 0

Related Questions