How does google process 600K documents in .33 seconds?

Question

Regardless of fast their CPUs are,it seems impossible to process that many documents in .33 seconds.

So I believe that it comes down to horizontal scaling. As a guess, how many servers were involved with this query that process 600k documents in under a second?

Stephen Ostermiller · Accepted Answer

Google doesn't process that many documents that quickly. Google pre-processes the documents well before you do your search. Google maintains a "search index" that is used to produce the list of search results.

You can think of a search index like the index in a paper book. For each word, it says what pages on the internet use it. For a query, it looks up each of the words in your query in the search index and creates a list of results from that.

For reference: What Is A Search Index And How Does It Work? - AddSearch

Google also has a lot of computers and does a ton of horizontal scaling. It has horizontal scaling for each of the stages of building the search index and displaying search results:

Crawling (Googlebot is a horizontally distributed web crawler)
Relevancy (Deciding how important each word is to the page)
Indexing (Creating the search index)
Reputation (Calculating how trusted each site and each page should be)
Spam and fraud detection (deciding what shouldn't be in the index)
Queries (against the search index)

But there is no amount of horizontal scaling that would allow search engines to process documents in real time based on your search query.

How does google process 600K documents in .33 seconds?

Answers (1)

Related Questions