spamup
spamup

Reputation: 466

Fast search algorithm

  1. Let's have tons of posts.
  2. As a user, I want to find all posts containing the words "hello" and "world".
  3. Let's say there is a post with this text "Hello world, this place is beautiful".

Now:

To reduce the quantity of possible candidates I was thinking about this:

for each post (
if number_of_search_words == number_of_post_words -> proceed with search logic
if number_of_search_words < number_of_post_words -> proceed with search logic
if number_of_search_words > number_of_post_words -> don't proceed with search logic
)

but that would also require an number containing the quantity of words of each post, which leads to more complexity.

Is there an elegant way of solving this?

Upvotes: 0

Views: 177

Answers (1)

olegarch
olegarch

Reputation: 3891

You must to use bit containers, for example, BitMagic.

  • Initially, you assign to each post some sequenced integer ID, postID.
  • Thereafter, create N bit containers (N = quantity of search words), each size is maximal postID.
  • Thereafter, build indices: parse each post, and for each term from the post, set bit1 in the term-associated container, with postID as index.

To search:

  • get bit containers for your words "hello", "word".
  • AND those bit containers.
  • Result container will contains bit 1's for PostIDs, contains both search terms.

Upvotes: 1

Related Questions