user2058002
user2058002

Reputation:

Finding similar posts with PostgreSQL

I have a table posts:

CREATE TABLE posts (
  id serial primary key,
  content text
);

When a user submits a post, how can I compare his post with the others and find similar posts?
I'm looking for something like StackOverflow does with the "Similar Questions".

Upvotes: 2

Views: 492

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656666

While Text Search is an option it is not meant for this type of search primarily. The typical use case would be to find words in a document based on dictionaries and stemming, not to compare whole documents.

I am sure StackOverflow has put some smarts into the similarity search, as this is not a trivial matter.

You can get halfway decent results with the similarity function and operators provided by the pg_trgm module:

SELECT content, similarity(content, 'grand new title asking foo') AS sim_score
FROM   posts
WHERE  content  % 'grand new title asking foo'
ORDER  BY 2 DESC, content;

Be sure to have a GiST index on content for this.

But you'll probably have to do more. You could combine it with Text Search after identifying keywords in the new content ..

Upvotes: 5

Neil McGuigan
Neil McGuigan

Reputation: 48256

You need to use Full Text Search in Postgres.

http://www.postgresql.org/docs/9.1/static/textsearch-intro.html

Upvotes: 0

Related Questions