baek
baek

Reputation: 450

Full text search or Full text + DB queries

So, I've been trying to find out how full-text searches are actually implemented in production. Most articles highlight the technologies and set a baseline implementation for the same. However, I would presume you are indexing 2+ fields for any application. In this specific case, considering a move towards serverless applications and using NoSQL DBS, from what I can tell there are two approaches, let's use the following data set as an

{
  "id": 1,
  "title": "iPhone 9",
  "description": "An apple mobile which is nothing like apple",
  "price": 549,
  "discountPercentage": 12.96,
  "rating": 4.69,
  "stock": 94,
  "brand": "Apple",
  "category": "smartphones",
  "thumbnail": "https://i.dummyjson.com/data/products/1/thumbnail.jpg",
  "images": [
    "https://i.dummyjson.com/data/products/1/1.jpg",
    "https://i.dummyjson.com/data/products/1/2.jpg",
    "https://i.dummyjson.com/data/products/1/3.jpg",
    "https://i.dummyjson.com/data/products/1/4.jpg",
    "https://i.dummyjson.com/data/products/1/thumbnail.jpg"
  ]
}

For a product search to work -

Assumptions

FTS Engine used e.g. Algolia, Meilisearch, Typsense etc.

NoSQL DB: Cloud Firestore, MongoDB Atlas etc.

Approach 1

Index all fields in an FTS engine and use that data as the source. In this case, there are no calls made to the actual DB this data was initially indexed from. It is assumed that the FTS engine and DB data are in real-time sync.

Pros: No DB calls, especially helpful if you are paying per read with NoSql DBS e.g. Firestore

Cons: As the data grows so does your FTS engine infrastructure i.e. in memory or storage or both. Indexing all fields also means potential performance hits on useless fields e.g. URLs in the above example.

Approach 2

Pros: Index only the required fields e.g. Title and Description. Use the FTS response to query the DB and retrieve the rest of the data.

Cons: Application performance may reduce due to multiple remote calls for both FTS and DBS. Increased cost on pay per read for NoSQL DBS.

In the production app where the data is only increasing. What are some options developers/architects are considering:

  1. Do the benefits of using FTS as the source of truth outweigh a combined approach?
  2. Are both approaches viable based on the rate of compounding data (vertical infra scaling) vs designing an app that will always efficiently perform FTS and DB queries?
  3. "This is a million dollar question" - Wouldn't you like to know?

P.s. I'm no expert at any of this so if you think this doesn't make sense, apologies in advance.

Upvotes: 0

Views: 242

Answers (1)

Alex Mamo
Alex Mamo

Reputation: 138834

In the production app where the data is only increasing. What are some options developers/architects are considering?

It's not about that one approach is better than the other. To be able to choose one approach over the other, you have to know some details. So most likely you need to measure that in terms of speed and costs.

If you can solve the search with what Firestore provides, then it's not necessarily required to use a third-party library like the one you mentioned in your question. If you need a more complex search, then you have to choose one of the available Firebase Extensions for search, to make your life easier.

Maybe the technique that is described in the following article will help:

Upvotes: 1

Related Questions