Patrick
Patrick

Reputation: 4905

How to find "related items" in PHP

we often see 'related items'. For instance in blogs we have related posts, in books we have related books, etc. My question is how do we compile those relevency? If it's just tag, I often see related items that does not have the same tag. For instance, when search for 'pink', a related item could have a 'purple' tag.

Anyone has any idea?

Upvotes: 16

Views: 4404

Answers (8)

darpet
darpet

Reputation: 3131

Here is an implementation of jaccard index between two texts based on bigrams. https://packagist.org/packages/darkopetreski/textcategorization

Upvotes: 0

hans
hans

Reputation: 21

To get a simple list of related items based on tags, the basic solutions goes like this:

3 tables, one with items, one with tags and one with the connection. The connection table consists of two columns, one for each id from the remaining tables. An entry in the connection table links a tag with an item by putting their respective ids in a row.

Now, to get that list of related items.

fetch all items which share at least one tag with the original item. be sure to fetch the tags along with the items, and then use a simple rating mechanism to determine, which item shares the most tags with the original one. each tag increases the relation-relevancy by one.

Depending on your tagging-habits, it might be smart to add some counter-mechanism to prevent large overarching tags from mixing up the relevancy. to achieve this, you could give greater weight to tags below a certain threshold of appliances. A threshold which has generally worked nicely for me, is total_number_of_tag_appliances/total_number_of_tags, which results in the average number of appliances. If the tags appliance-count is smaller than average, the relation-relevancy is increased double.

Upvotes: 2

Oto Brglez
Oto Brglez

Reputation: 4193

This is my implementation(GIST) of Jaccard index with PostgreSQL, and Ruby on Rails...

Upvotes: 0

Bakhtiyor
Bakhtiyor

Reputation: 7318

I would say they use ontology for that which adds more great features to the application.

Upvotes: 1

andyk
andyk

Reputation: 10008

Here are some of the ways:

  1. Manually connecting them. Put up a table with the fields item_id and related_item_id, then make an interface to insert the connections. Useful to relate two items that are related but have no resemblance or do not belong to the same category/tag (or in an uncategorized entry table). Example: Bath tub and rubber ducky
  2. Pull up some items that belong to the same category or have a similar tag. The idea is that those items must be somewhat related since they are in the same category. Example: in the page viewing LCD monitors, there are random LCD monitors (with same price range/manufacturer/resolution) in the "Related items" section.
  3. Do a text search matching current item's name (and or description) against other items in the table. You get the idea.

Upvotes: 4

sfrench
sfrench

Reputation: 910

There are many ways to calculate similarity of two items, but for a straightforward method, take a look at the Jaccard Coefficient.

http://en.wikipedia.org/wiki/Jaccard_index

Which is: J(a,b) = intersection(a,b)/union(a,b)

So lets say you want to compute the coefficient of two items:

Item A, which has the tags  "books, school, pencil, textbook, reading"
Item B, which has the tags  "books, reading, autobiography"

intersection(A,B) = books, reading
union(A,B) = books, school, pencil, textbook, reading, autobiography

so J(a,b) = 2/6 = .333

So the most related item to A would be the item which results in the highest Jaccard Coefficient when paired with A.

Upvotes: 30

Natrium
Natrium

Reputation: 31174

it can also be based on "people who bought this book also bought"

No matter how, you will need some dort of connection between your items, and they will mostly be made by human beings

Upvotes: 0

Sarfraz
Sarfraz

Reputation: 382706

It can be more than a tag, for example it can be average of each work appearing in a paragraph, and then titles, etc

Upvotes: 1

Related Questions