Reputation: 2673
I have a program which analyzes 150,000 files. Valgrind reports no memory leak but the program slows over time.
Some of the problems were related to using std::string too often and mktime taking too much time. (see C++ slows over time reading 70,000 files)
But it still slows down over time. Lotharyx suggested that container usage is causing heap fragmentation.
I read the various flow charts on the pros and cons of the different STL containers but I didn't quite get it.
In the pseudo code below, I'm not sure I've made the right choices to avoid heap fragmentation.
fileList.clear()
scan all disks and build "fileList", a std::set of file paths matching a pattern.
// filepaths are named by date, eg 20160530.051000, so are intrinsically ordered
foreach(filePath in fileList)
{
if (alreadyHaveFileDetails(filePath))
continue;
// otherwise
collect file details into a fileInfoStruct; // like size, contents, mod
fileInfoMap[time_t date] = fileInfoStruct;
}
// fileInfoMap is ordered collection of information structs of about 100,000 files
// traverse the list in order
foreach (fileInfo in fileInfoMap)
{
if (meetsCondition(fileInfo))
{
TEventInfo event = makeEventInfo()
eventList.push_back(event);
}
}
And the above sequence repeats forever.
So for choice of containers, I've used (or need):
fileList
-- list of unique strings containing 150,000 pathnames.
I chose std::set because it it automatically handles duplicates and maintains sort order automatically.
No random access, only add the entries, sort them (manually or automatically), and iterate over them.
fileInfoMap
-- an array of structures keyed by a time_t timestamp corresponding to the date of the file.
I chose std::map. It too would have 150,000 entries so occupies a lot of memory.
No random access, only add the entries to one end. Must iterate over them and, if necessary, delete entries from the middle.
eventList
-- a small list of "event" structures, say 50 items.
I chose std::vector. Not sure why really.
No random access, only add entries to one end and later iterate over the collection.
I'm fairly new to C++. Thanks for your consideration.
Upvotes: 0
Views: 1839
Reputation: 20730
About memory management, container belongs to two large families: the one that allocate all elements together, and the one that allocate elements separately.
vector and deque belong to the first family, list, set and map to the second.
Memory fragmentation arises when elements are continuously added and removed from a container that is not supporting global relocation.
One way to avoid the problem is to use a vector
s, using "reserve
" to anticipate the memory need to reduce relocations, and keeping the data sorted upon insertion.
Another way is to use "linking based container" (like list, set etc.) providing them an allocator that allocate memory from larger chunks, recycling them instead of calling a raw malloc/free for every single element insert/remove.
Give a look to std::allocator
You can easily write an allocator by deriving from std::allocator and overriding the allocate
/deallocate
functions adding all the required logic, and passing yourallocator
as the optional template parameter of the container you will like to use.
Upvotes: 2