Reputation: 91
http://infolab.stanford.edu/~backrub/google.html
I was reading Google's whitepaper on search engine implementation.
In google's system design, the Inverted Index is being sharded into Barrels, hopefully for faster lookups.
I am building my own search index just as practice and I have implemented by Inverted Index as Hash Map (giving me lookups in constant time complexity)
Now if I shard my inverted index into a bunch of Barrels, I still don't see any utility in using them over one monolithic Inverted Index.
Because when the Inverted Index object is instantiated, if all the barrels are deserialized from disk into RAM (loading barrels into RAM), the total size of all the Barrels would still be equivalent to the monolithic Inverted Index. In this case, using barrels over one Large Inverted Index is not particularly optimal in terms of computational complexity.
In a different case, where I only load a handful of Barrels into RAM, if I get a word in query that is present in an unloaded Barrel, that Barrel would've to be loaded which would take processing time, slowing down the search engine's performance.
What do you guys recommend me in this case?
Upvotes: 0
Views: 35
Reputation: 1
I think that the barrel size should be small enough so that the loading time can be reduced.
Upvotes: -1