Reputation: 509
I've been trying to read and index a set of links and wikipedia abstracts with a haskell program. The aim is to
The problem is that my program is very slow and uses large amounts of memory. I've done some profiling and I know most of the time (~50%) is spent in garbage collection. I've tried everything I could think of to get to what I have now, including:
Code, data, profiling output and compilation stuff: https://gist.github.com/ririw/8205284
Thanks!
Upvotes: 0
Views: 232
Reputation: 509
Ok, I have managed to reduce the memory use by 100MB or so on my test set (the first 400K lines of the dbpedia links dataset, available at http://downloads.dbpedia.org/3.9/en/wikipedia_links_en.nt.bz2).
I did this by switching to the io-streams library which seems to have solved some of the issues with lazyness.
You can see the new solution at https://gist.github.com/ririw/8207250
On a 7574825 line dataset it does ok, and the memory usage ticks up in the way I'd expect a hash table to tick up as it filled itself and shuffled things around (https://i.sstatic.net/LFwpV.jpg). Of course, that might also be the io-streams buffers doing the same thing.
It still uses a lot of memory :(, but I think has solved the lazyness/io issue
Upvotes: 1