Reputation: 1890
I'm using VB.Net, and I have a set of data which I have to able to filter through fairly quickly. Basically, the program is like google sugest, but instead of a drop-down menu, I'm using a listbox. When a user enters a word, I compare the word using LINQ and filter those that contain the user's input. The data are all strings of variable length (from 0 to 200 characters, most on 150 character mark), and I have 240,000+ of this strings and counting- all stored in an XML file.
A colleague of mine told me that loading all of that to memory (using VB.Net's XML serializer plus collections of string/objects) is not practical, and would slow the 'startup' time of the program. I haven't finished building the program yet and I'm having second thoughts about continuing this path.
So, my question is: Should I continue with my current approach on the problem (which is load everything to memory on startup), or is there a better way of solving my dilemma?
Upvotes: 1
Views: 546
Reputation: 96702
You might be better served by using binary serialization rather than XML serialization to persist the data that your app reads on startup, particularly if you end up implementing a data structure that's faster to search than a `StringCollection. You'd still maintain the XML version of the data somewhere, of course.
And by all means, use a BackgroundWorker
to load the data asynchronously if that'll make your application feel more responsive.
Upvotes: 0
Reputation: 22220
It may not be a bad idea to load the XML into memory when the app starts up. But if you go this route I'd look into using the BackgroundWorker thread. The idea would be to load the XML into memory asynchronously so the UI is still responsive as this is going on. As far as the user is concerned the app shouldn't appear to start any slower, and yet once done the Google-suggest-like feature should be significantly faster.
I must say that even in memory this is an inherently inefficient operation since you have no advantage of using an index when querying an XML file in this way. This is something that would be 10X faster in SQL with full-text searching.
Of course XML has the advantage of being self-contained and requiring no additional components. And that makes it a decent choice for small desktop apps that query small amounts of data. Otherwise I would consider using a database for better performance.
Upvotes: 0
Reputation: 75125
The question seems to imply an online application. A few suggestions if that is the case:
Edit: Independently of how the data gets downloaded and cached, I second the opinion of Mircea Grelus that an xml file of this size is a poor substitute for a database.
Upvotes: 0
Reputation: 2915
If you want to prevent startup time and keeping it in memory isn't an issue on performance, then load it asynchronously. Although loading 240.000+ strings from an XML and keeping it in memory doesn't sound like the greatest idea. Probably a database would be the better approach. Or at least some format like JSON that's faster to parse.
Upvotes: 4
Reputation: 185643
You're talking about loading roughly 36MB of strings. While this isn't a daunting amount by any means (though you could probably load it faster reading the XML yourself...I wouldn't go with the serialization engine if I was worried about performance), it's also a non-trivial amount. You're looking a adding a couple of seconds to your startup time, assuming you don't do it asynchronously as Mircea suggests.
If you do do it asynchronously, you'll have to ensure that any UI process that relies on the data doesn't occur until after it has loaded. That may be a difficult thing to ensure.
Upvotes: 0
Reputation: 50097
Depends on a number of things:
If
((you know the strings will not hugely increase in number) &&
(you know the spec of the machines that will run your app) &&
(you are able to test that the load time is *good enough* on the above spec))
{
**don't bother changing approach.**
}
else
{
**change approach.**
}
The alternative approach is obviously some kind of asynch lazy-load.
Upvotes: 0