Dru
Dru

Reputation: 9820

"Related Items" in Ruby

How can I implement a 'related items' feature for posts in a blog? I would like to return a list of similar posts based on analysis of post titles.

My own ideas for doing this seem very inefficient and I wonder if there are tools that already support this functionality. I didn't find any help via google, ruby toolbox, and I looked at sunspot api. How would you achieve this in your blog application/content site?

Update

For those interested in this functionality, I decided to go with sunspot which allows me to use this in my show action:

@find_related = Post.search do 
  fulltext params[:title]
end 

This returns an array of related posts:

@related = @find_related.results

Thanks for all the feedback and this railscast was a big help

Upvotes: 2

Views: 176

Answers (4)

farnoy
farnoy

Reputation: 7776

Sure there are some good and efficient tools for that! Technically, what you want is a full text search on an indexed database of post titles/other data. We have tools that run external database that handles all the searching and indexing. Those backends are universal and not in ruby, you only use client logic in your app. That's very efficient, as you probably won't be able to implement any other algorithms than existing ones. I would recommend the following:

These libraries provide client logic for data exchange with above mentioned search engines (all are from Apache foundation)

Upvotes: 2

Linuxios
Linuxios

Reputation: 35803

If you are going by the words in the title, this crude simplistic solution might give you a stepping stone for something production ready:

#assume titles is an array of arrays of the words of titles and title is the one we are trying to mach with
HOW_MANY_RELATED_WORDS=3
titleWords=title.split(' ').sort
related=[]
trues=0
titles.each do |t|
  t.each do |word|
    true+=1 if titleWords.include?(word)
  end
  related<<t.join(' ') if(trues>=HOW_MANY_RELATED_WORDS)
end

Upvotes: 1

Devin M
Devin M

Reputation: 9752

There are a few ways to do this, both with different pros and cons.

The easy way would involve tagging your posts with key words and then using those to pull other articles that have matching tags and then sorting those results by the number of identical tags. As long as the tags you place on the content represent it well then this produces good results without many false positives. And as far as I know this is how many blogging platforms implement the feature.

The more complex method would involve using NLP to parse the titles of each post and calculate the fitness to another post. This would involve writing more code and may produce false positives. However you wouldn't have to tag posts by hand and you can tweak the methods used to find posts if you want to weight certain words or phrases. Take a look at Treat, the Text Retrieval, Extraction and Annotation Toolkit it seems like a good starting point for Ruby NLP.

Upvotes: 1

Roland Mai
Roland Mai

Reputation: 31077

I am not sure if your requirements allow this, however blog posts generally have tags.

You could use the tags in your blog posts as a way of filtering other related posts, because posts with similar tags should be related somehow. You can then sort by count of matched tags and latest date published.

Upvotes: 1

Related Questions