Reputation: 445
I need to develop an application that read a file containing graph of highways of Russia. On the base of content of the file an application must detect the shortest route between two specified towns. The application has to be written in unmanaged code in C++. I need to develop this application in MS VS 2013 as C++ console application without MFC support. There is a Windows 7 operating system on the custoner computer. As a search engine - "A*" algorithm must be used. My problem is the following. File containing graph of highways has a size of 25GB but the capacity of RAM on customer computer is 16GB only and no opportunity to extend it. Is there any programming technology for unmanaged C++ that I can use in this case to process large file? I have in mind in the case where the size of read file is larger than RAM capasity on the computer. In what manner I should design an application architecture in this case?
Upvotes: 0
Views: 2357
Reputation: 2003
Not sure if this is still an open issue, just saw this question.
A 32-bit application is perfectly capable of reading/writing files > 4GB.
You do not even need to create a file mapping for it, unless you absolutely must have (the equivalent of) an in-memory representation of the data. If that is the case, it would be a bit more work but still doable (you could create a class to that wraps a mapped view of a file and handles "chunking" the data into sizes that fit into your available address space). As noted by others, you cannot map a file larger than the address space at the same time. Also, the comments and caveats in previous answers/comments regarding performance considerations would apply.
A few questions: How is the file indexed (if it is)? Can you post an example of the index layout and a record if it has an index? Even if not indexed, you could create your own index and access records through your wrapper class. Can you post an example of an individual record?
If you do not need an in-memory representation, a better option - certainly from a performance standpoint - would be to just create a class that handles reads from/writes to the file and use SetFilePointerEx
to seek before any reads. If the file has an index that would fit into the visible address space, you could even just map that portion of the file into memory and do seeks/reads as necessary for the individual records (that are not mapped). A more complex approach could also handle caching records on an MRU (or some other) basis to help with performance.
Upvotes: 0
Reputation: 73376
Your question is very broad ! The main problem you'll face is that loading/unloading chunks of data can make your search extreamly slow, especially if adjacent nodes are in different chunks.
As your graph is real world geography, your A* heuristic will certainly be mathematical distance btw points. This means that your algorithm will tend to develop path primarily based on the distance to target, and you could use this to optimize loading/unloading:
Here some hints:
if you could organise your data file by grouping data by geographic squares, you could at least reduce a little bit the loading/unloading cost.
think of an in-memory index of the fil offset of each square, so that you could access faster to the new chunks to load.
you could also combine this approach with caching. Instead of loading big squares into one chunk of maximum size, better load smaller squares into several smaller memory chunks that you manage with a caching algorithm (least recently used gets unloaded): As many geographic areas are not needed they will be replaced by most often used nodes (i.e. nodes around highways).
with a little creativity, you could perhaps also deviate slightly from standard A*, by adding a little bit of "beaming": Instead of always taking the best path to expand, take into consideration that it could sometimes be more efficient to expand paths that remain in the same memory chunk.
Upvotes: 0
Reputation: 179819
Using std::readline
, you can read a file one line at a time. 640 kB RAM would be enough ;)
I'm pretty sure it's a text file, possibly even XML. In that case you'd use a dedicated "SAX" XML parser. I know it's not binary because I know that you can fit the entire map of Europe (highways and all minor roads too) in under 8 GB.
BTW, A* is obsolete. Modern routing algorithms such as ArcFlags are much faster.
Upvotes: 2
Reputation: 2601
You have to call CreateFileMapping on the file handle and then MapViewOfFile on the mapping handle. It is very convenient and allows you access the whole of the file without reading the file. Your target must be 64-bit in this case...
Upvotes: 2
Reputation:
Hm... why not to just read file by small portions? Lets say, by 128 records from file. You can create some class, which will hide inside this mechanism, so other parts of program don't admit difference between full loading and this method. Other way - map file to memory.
Btw, are u from Russia?
Upvotes: 0