Quickly loading large data structures from a file

I have a large dictionary of english words (around 70k of them) that I load into memory at the beginning of the program. They are loaded into a radix trie data structure, and each trie node often has many links from one node to many others (for example the word antonyms, "dead" -> "alive", "well"). Each node also have a std::vector<MetaData> in it which contains various miscellaneous metadata for my program.

Now, the problem is with the loading time of this file. Reading the file from disk, deserialization and allocating the data structure for the thing in general takes a lot of time (4-5 seconds).

Currently, I'm working on making the load asynchronously (or bit by bit, a fraction of them per frame), but due to the nature of the application (it's a mobile keyboard), there's just plenty of times where it simply has to be loaded quickly.

What can be done to speed up loading? Memory pool everything? I am benchmarking different parts to see what can be optimized, but it looks like, so far, it's just little things that add up.

Upvotes: 1

Answers (4)

Jim Mischel

Reputation: 133950

If the trie is static (i.e. doesn't change when the program's running), then build an optimized version in an array using array indexes in place of pointers. You can then save that as your data file. Startup then amounts to just loading that block of data into memory.

Doing it that way makes some things less convenient (you'll have to use arrays rather than std::vector, for example), and you might have to do a bit of casting, but with a little thought you end up with a very compact and very fast data structure that doesn't suffer from the allocation overhead associated with creating an object for each node. Instead, it's essentially an array of varying length structures.

I did this for an application that used a directed acyclic word graph (DAWG). Rather than rebuild the DAWG every time the program was loaded (a time consuming process), I had a utility program that created the DAWG and shipped that as the data file in place of the word list.

Upvotes: 3

Apiwat Chantawibul

Reputation: 1463

Another idea based on the fact that it is a mobile keyboard application. Some words are used more often than others, so maybe you could organize it so the frequently used words are loaded first and leave the infrequently used ones to be loaded as it is needed (or as you have time).

Upvotes: 0

Thomas Matthews

Reputation: 57678

You should review the structure of the data to make the data faster to load.
Also, splitting into multiple tables may speed things up.

For example, have one table for the words, another table for synonyms and additional tables for other relationships.

The first table should have organization. This allows the synonym table to be represented as ; which should load fast.

You can then build any internal containers from the data loaded in. A reason for having different data structures for store data vs. internal data is for optimization. The structures used for data storage (and loading) are optimized for loading. The structure for internal data is optimized for searching.

Upvotes: 1

laune

Reputation: 31290

Not knowing the details, only a vague idea:

Loading the bulk data (entries) will give you the basic dictionary.

For all the cross references like syn- and antonyms and whatever, load and process the data in background, after you've shown "ready". Chances are, until A. User has typed in the first query, you are ship-shape.

Later

If the file is rather big, reading a compressed version may gain.

Also, a BufferedReader with a suitably increased buffer size may help.

Upvotes: 1

Quickly loading large data structures from a file

Answers (4)

Related Questions