Reputation: 11
I am working through some problems and have come across one which I think I understand, however I'd like to ensure so. The question is about building a search engine for crawled documents where we downloaded the title, abstract and main body. We want it to be fast, and we have virtually infinite disk space at low costs. Hence, should we use doc values, inverted indexes, or a mixture of both (ex. doc values for title and inverted index for rest of fields).
Now I know that DocValues are typically use for groupting/sorting/filtering, however my argument was that even though storage space is cheap (DocValues use more storage), in a search engine we are mainly doing full text searches which perform better and use less storage via inverted indexes, hence we should use only inverted indexes.
I am wondering whether this interpretation/analysis is correct or not, will have to implement this in code.
Upvotes: 0
Views: 21