How does google process 600K documents in .33 seconds?

enter image description here

Regardless of fast their CPUs are,it seems impossible to process that many documents in .33 seconds.

So I believe that it comes down to horizontal scaling. As a guess, how many servers were involved with this query that process 600k documents in under a second?

Upvotes: 0

Views: 55

Answers (1)

Stephen Ostermiller
Stephen Ostermiller

Reputation: 25575

Google doesn't process that many documents that quickly. Google pre-processes the documents well before you do your search. Google maintains a "search index" that is used to produce the list of search results.

You can think of a search index like the index in a paper book. For each word, it says what pages on the internet use it. For a query, it looks up each of the words in your query in the search index and creates a list of results from that.

For reference: What Is A Search Index And How Does It Work? - AddSearch

Google also has a lot of computers and does a ton of horizontal scaling. It has horizontal scaling for each of the stages of building the search index and displaying search results:

  • Crawling (Googlebot is a horizontally distributed web crawler)
  • Relevancy (Deciding how important each word is to the page)
  • Indexing (Creating the search index)
  • Reputation (Calculating how trusted each site and each page should be)
  • Spam and fraud detection (deciding what shouldn't be in the index)
  • Queries (against the search index)

But there is no amount of horizontal scaling that would allow search engines to process documents in real time based on your search query.

Upvotes: 1

Related Questions