Reputation: 47
What is the costs of querying on subdocuments in cosmosdb ? When reading the documentation it seems that only IDs and their paths are indexed.
Does it means that each time you are querying on a subdocuments using the From
clause like this:
SELECT *
FROM Families.children
it will parse the documents matching this path and create a view of them? Are sub documents stored as well?
Upvotes: 0
Views: 655
Reputation: 4994
CosmosDB stores the entire JSON document as a tree (any depth, including what i guess you call "subdocuments"). The FROM..IN
join syntax is just a syntactic sugar to make it easier for you to express those paths in your SQL query. For example, "remove" array indexes from tree paths and provide pretty shortcuts for both SELECT
and WHERE
part. It never actually includes more data to your query just assigns named pointers to parts within each of the to-be-returned root documents.
The indexed paths just contain the reference to matching document roots they do not make copies of the "subdocuments". Contrary to relational SQL, there is no covering index in CosmosDB.
When you "select a subdocument" you are actually instructing cosmosDB to return a projection from root document tree-presentation to its subpart. In a sense it is (non-serialized) view to the original document.
Definitely go and check the video Azure Cosmos DB Indexing which explains it really well. The main takeaway is this image (screenshot from the above video):
Most important for performance is not HOW you reference an node in index tree, but whether it is indexed at all and selective enough. Where the index value resides in document is in practice not too relevant.
I guess there is a theoretical overhead to traversing the index tree (and document tree) deeper but i'm pretty sure the impact is in practice so micro you usually can't even measure it. Most of the work in that N ms query is spent elsewhere (checking additional predicates on matching documents from index, building output Json object from internal tree model, serialization).
Similarly to relational SQL, returning only "subdocuments" is most likely slightly faster compared to full root documents, because the SELECT-step can skip non-relevant parts of the document tree. But probably it depends more on overall included tree node count and data size, and not as much on the shape of the tree.
.. and the usual advice on perfomance - DO write some tests. For CosmosDB you can measure the RU cost and check the PopulateQueryMetrics header.
The above graphics would suggest that they are not - linking to root document only. But it is obviously a simplification and hence not conclusive. Still, you should be able to design a test for this. Alternative is to go ask the team for internal design via [email protected].
Upvotes: 1