Sam
Sam

Reputation: 81

Is Solr a Good Solution for the Problem Described Below?

I am volunteering for a non-profit and the CEO would like an application that stores resumes of University professors. The resumes are to be searchable so that possible employers can find them. The resumes could be in a variety of formats, including pdf or Word.

The Web site is currently based on Joomla!, but may move to Drupal. In either case, the developers are familiar with PHP. I am familiar with PHP as well as Java.

What is the best architecture for this application? I am considering:

  1. Installing either the Java or PHP version of SOLR and linking to it through PHP, using the PHP Solr extension.
  2. Using the PHP version of Lucene directly and bypassing Solr.
  3. Use the Search Lucene API Drupal extension, which provides Solr-like functionality.

If I have left any possibilities out, please let me know.

Also, I couldn't find a good book on Solr on Amazon. There is a good one on Lucene, though (the In Action series). Unfortunately, it only briefly mentions Solr. Is it worthwhile reading a good book on Lucene in order to understand how to use Solr better, or would I be wasting my time/money? I also couldn't find any good books on Solr...but maybe you can recommend one.

Upvotes: 3

Views: 1158

Answers (3)

Yavar
Yavar

Reputation: 11933

Solr is a great option however based upon your requirement i suggest you go with Sphinx Search engine which has an excellent extremely well documented PHP API. Note that I love Solr for some of its great features however Solr cant beat Sphinx with respect to Indexing Algorithms (i.e Index Time and Index Size on the disk).

There is an excellent book available on Solr - Solr 1.4 Enterprise Search Server [PACKT PUBLISHING]. You can also go through IBM Developer works great article on Solr. Search for "Searching Smart with Solr IBM Developerworks " on Google.

PS: Still I feel Sphinx would be the best choice for you.

Upvotes: 0

Mauricio Scheffer
Mauricio Scheffer

Reputation: 99730

Yes, Solr is a good match:

  • Solr comes out of the box with a feature called ExtractingRequestHandler, which lets you easily index Word, PDF and other proprietary formats.
  • Solr is highly configurable when it comes to full-text searching, you'll probably get better results than with MySQL full-text.
  • Solr is fast. MySQL full-text, not so much.
  • Solr enables faceted navigation.
  • There are two Joomla integration modules for Solr (JSolr, TNR ESearch) and one for Drupal.

Choosing Solr is not just about its performance, it's also about its features and flexibility.

About Solr books, see:

Books about Lucene will help you understand how text is processed under the hood, which may come in handy if you have to fine-tune text analysis, however I'd recommend starting with a book about Solr.

Upvotes: 2

Layke
Layke

Reputation: 53146

Based on what you have explained, no Solr is not a good match.

You would be more than capable of being able to do fulltext searching through mysql if you needed. The fact that you mention Joomla and Drupal obviously point to that as being the RDMS that you are using.

If I were to start this project from afresh, I would probably use some noSQL engine, something like MongoDB to create my Resume Documents. www.mongodb.com

That is how I would persist my data.

If it then comes to wanting to search the documents, I would only consider using Solr if I expect to have thousands of tens of thousands of searchs a day. It really doesn't require the effort of implementing a Solr application if you only expect 100-1000 searches a day.

And, to answer you book question on Solr, the book which I own and would recommend, is http://www.packtpub.com/solr-1-4-enterprise-search-server/book but I'm sure you could probably find something a little more recent. I bought that like 18-24 months ago.

You first would want to store the details of each person... so

Upvotes: -1

Related Questions