Ali
Ali

Reputation: 2042

Generate keywords for contents through Solr

I'm integrating Solr for my new PHP application.

As I'm newbie in solr section, I want to know that is it possible to generate some useful tags for every content pages through solr? something like auto-tagging mechanism.

Thanks in Advance...

P.S My contents available in both Persian and English languages.

Upvotes: 2

Views: 819

Answers (2)

Anis
Anis

Reputation: 3599

As it's a PHP application, if it's OK for you to generate tags in php and then inserting/updating to Solr, Here are few options -

  • If using a web service is OK, check Yahoo's Term Extractor
  • If you can/want to host a term extraction service yourself to (may be in local server), check FiveFilters
  • Here is a php function for extracting valuable words from text block. Surely not as efficient as Yahoo Term Extractor, but it may work for you.

Upvotes: 1

The Bndr
The Bndr

Reputation: 13394

something like auto-tagging mechanism.

Yes, you can build something like that.

There are 2 different ways to realize that:

  1. Use the Clustering Component from Solr to build groups of docs and label those docs by solr. The labels are something like the taggs your are looking for.
  2. Realize a tagging by using the MLT feature.

I started an auto-tagging project with the 1.) method with medium success. Finding labels for a cluster of documents is an hard process.
But fortunately, I had some already taggegd documents. If you also have some documents with valid tags, than you can use the 2.) method to use those document as an base to start learning:

Take a document without tags and perform a MLT search against docs with tags. Take the tags from the docs you fond and count them. Depending on the count, apply one or more tags to the untaggegd document. In my case, that works very well. Method 2.) is an cheep implementation of machine based learning, but you will get 95% success with only 5% Work-input.

Upvotes: 2

Related Questions