SunnyShah
SunnyShah

Reputation: 30487

Indexing algorithms to develop an app like google desktop search?

I want to develop google desktop search like application, I want to know that which Indexing Techniques/ Algorithms I should use so I can get very fast data retrival.

Upvotes: 2

Views: 3309

Answers (2)

Nick Johnson
Nick Johnson

Reputation: 101149

In general, what you want is an Inverted Index. You can do the indexing yourself, but it's a lot of work to get right - you need to handle stemming, stop words, extending the posting list to include positions in the document so you can handle multi-word queries, and so forth. Then, you need to store the index, probably in a B-Tree on disk - or you can make life easier for yourself by using an existing database for the disk storage, such as BDB. You also need to write a query planner that interprets user queries, performs query expansion and converts them to a series of index scans. Wikipedia's article on Search Engine Indexing provides a good overview of all the challenges, too.

Or, you can leverage existing work and use ready-made full text indexing solutions like Apache Lucene and Compass (which is built on Lucene). These tools handle practically everything detailed above (and more), which just leaves you writing the tool to build and update the index by feeding all your documents into Lucene, and the UI to allow users to search it.

Upvotes: 8

David Crawshaw
David Crawshaw

Reputation: 10577

The Burrows-Wheeler transform, used to compress data in bzip2, can be used to make substring searching of text a constant time function.

http://en.wikipedia.org/wiki/Burrows-Wheeler_transform

I haven't seen a simple introduction online, but here is a lot of detail:

http://www.ddj.com/architect/184405504

Upvotes: 4

Related Questions