Reputation: 5543
The documentation for the MilvusClient loadCollection() method just say "This method loads the specified collection and all the data within to memory for search or query.". I assume that this loading is done on the server, not the client but that gives me a number of questions:
The documentation says that this call is needed in order to perform any search in the collection. But why is this call explicit needed. Is there any reason the server can't just load the data when a query need it, if it is not already loaded?.
And more important: What about memory leak?
I can see there also is a method to release loaded data in the MilvusClient but that method is as far as I can see newer called from the langchain4j code, so how does the Milvus server handle loaded collections if they are newer freed? Do they stay in memory forever or do seldom used collections get flushed out, if Milvus need more memory?
Upvotes: 0
Views: 231
Reputation: 719279
I assume that this loading is done on the server, not the client ...
That is a correct assumption. The SO TagWiki for Milvus says:
Milvus was created in 2019 with a singular goal: store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models.
It wouldn't make sense to have a client-server architecture, and do all of that work on the client side. Ergo, the collection is loaded server-side.
"This method loads the specified collection and all the data within to memory for search or query."
This is a bit of a tangent, but in the javadocs I am looking at, it says (just)
Loads a collection to memory before search or query.
There is a parameter that (I think) can control how much data is loaded. Based on what I read in other parts of the documentation, you should be able say not to (immediately) load some fields.
Is there any reason the server can't just load the data when a query need it, if it is not already loaded?.
If you are asking for the reasons why they designed it this way, you would really need to ask the designers!
But if you have an architecture where data is only loaded as needed, then you run into the problem of not knowing when it is no longer needed.
And more important: What about memory leak?
Yes that is a problem; i.e. if a client loads data and then forgets to release it. But there are potentially ways to deal with that. For example, when a client disconnects, the server could automatically release any collections that it has loaded. (Apparently, it doesn't. Disconnecting a client doesn't affect the collections that it loaded. See https://github.com/milvus-io/milvus/discussions/27771)
And the flip-side to this is that if Milvus didn't have an explicit load/release model but relied entirely on lazy loading, then it would run into performance problems in various scenarios; e.g.
I imagine that the Milvus designers considered all of this (and maybe other issues) in arriving at the current form of the system.
I can see there also is a method to release loaded data in the MilvusClient but that method is as far as I can see newer called from the langchain4j code, so how does the Milvus server handle loaded collections if they are newer freed? Do they stay in memory forever or do seldom used collections get flushed out, if Milvus need more memory?
From what I have read, collections are not "flushed out".
If this was a problem in practice, a client application could be modified to explicitly release the collections that it loaded before it disconnected. Or you could implement a separate service to manage the loading and releasing of collections and keep track of which client applications are still alive.
The fact that Milvus doesn't handle this implies (at least, to me) that this is not a significant enough problem for enough users to require a solution. (Or I could be missing something ...)
Upvotes: 0
Reputation: 961
It is assumed that a Milvus collection has been created with in-memory indexing. As might be known, Milvus generates certain files related to the collection on the hard disk within the volumes
directory, where the Docker Compose files are located. When a collection is created and data is inserted into it, the data is loaded into memory, eliminating the need to call the loadCollection()
function again. However, in the case of having multiple collections that cannot all be loaded into memory simultaneously, it becomes necessary to release()
some of them, for example, collection1. To perform a query on collection1, the loadCollection()
function must be called first, followed by the query. The time required for loadCollection()
to complete depends on the size of collection1.
The collection remains loaded in memory indefinitely until it is released, the Docker Compose is stopped by the user, or an error occurs.
Regarding the flush()
function:
This operation seals all segments in the collection. Any insertions following this operation will result in the creation of a new segment.
Upvotes: 0