GaiusSensei
GaiusSensei

Reputation: 1890

Should I load everything in memory upon application start?

I'm using VB.Net, and I have a set of data which I have to able to filter through fairly quickly. Basically, the program is like google sugest, but instead of a drop-down menu, I'm using a listbox. When a user enters a word, I compare the word using LINQ and filter those that contain the user's input. The data are all strings of variable length (from 0 to 200 characters, most on 150 character mark), and I have 240,000+ of this strings and counting- all stored in an XML file.

A colleague of mine told me that loading all of that to memory (using VB.Net's XML serializer plus collections of string/objects) is not practical, and would slow the 'startup' time of the program. I haven't finished building the program yet and I'm having second thoughts about continuing this path.

So, my question is: Should I continue with my current approach on the problem (which is load everything to memory on startup), or is there a better way of solving my dilemma?

Upvotes: 1

Views: 546

Answers (6)

Robert Rossney
Robert Rossney

Reputation: 96702

You might be better served by using binary serialization rather than XML serialization to persist the data that your app reads on startup, particularly if you end up implementing a data structure that's faster to search than a `StringCollection. You'd still maintain the XML version of the data somewhere, of course.

And by all means, use a BackgroundWorker to load the data asynchronously if that'll make your application feel more responsive.

Upvotes: 0

Steve Wortham
Steve Wortham

Reputation: 22220

It may not be a bad idea to load the XML into memory when the app starts up. But if you go this route I'd look into using the BackgroundWorker thread. The idea would be to load the XML into memory asynchronously so the UI is still responsive as this is going on. As far as the user is concerned the app shouldn't appear to start any slower, and yet once done the Google-suggest-like feature should be significantly faster.

I must say that even in memory this is an inherently inefficient operation since you have no advantage of using an index when querying an XML file in this way. This is something that would be 10X faster in SQL with full-text searching.

Of course XML has the advantage of being self-contained and requiring no additional components. And that makes it a decent choice for small desktop apps that query small amounts of data. Otherwise I would consider using a database for better performance.

Upvotes: 0

mjv
mjv

Reputation: 75125

The question seems to imply an online application. A few suggestions if that is the case:

  • The data could / should be zipped. I suspect it would compress very nicely.
  • Maybe the data could be cached accross multiple sessions, possibly be delivered as html content with a expiry cache date as appropriate. This would save systematic loading, and may be feasible if the data isn't updated frequently.
  • The suggestion feature feature could be initially disabled (i.e. say showing a "loading..." message while the application initializes the cache, asynchronously). In this fashion the application would be quickly available upon startup, even though the suggest feature may lag by up to say 30 seconds or so.

Edit: Independently of how the data gets downloaded and cached, I second the opinion of Mircea Grelus that an xml file of this size is a poor substitute for a database.

Upvotes: 0

Mircea Grelus
Mircea Grelus

Reputation: 2915

If you want to prevent startup time and keeping it in memory isn't an issue on performance, then load it asynchronously. Although loading 240.000+ strings from an XML and keeping it in memory doesn't sound like the greatest idea. Probably a database would be the better approach. Or at least some format like JSON that's faster to parse.

Upvotes: 4

Adam Robinson
Adam Robinson

Reputation: 185643

You're talking about loading roughly 36MB of strings. While this isn't a daunting amount by any means (though you could probably load it faster reading the XML yourself...I wouldn't go with the serialization engine if I was worried about performance), it's also a non-trivial amount. You're looking a adding a couple of seconds to your startup time, assuming you don't do it asynchronously as Mircea suggests.

If you do do it asynchronously, you'll have to ensure that any UI process that relies on the data doesn't occur until after it has loaded. That may be a difficult thing to ensure.

Upvotes: 0

JohnIdol
JohnIdol

Reputation: 50097

Depends on a number of things:

If 
((you know the strings will not hugely increase in number) && 
(you know the spec of the machines that will run your app) && 
(you are able to test that the load time is *good enough* on the above spec))
{
**don't bother changing approach.** 
}
else
{
**change approach.**
} 

The alternative approach is obviously some kind of asynch lazy-load.

Upvotes: 0

Related Questions