Reputation: 12138
Reading a large XML file (few GB), the Swift program keeps eating memory until my whole system crashes. Not good. After digging around, dropping all useful code, the code below remains. It only defines a NSXMLParserDelegate
on which one protocol method was implemented. Now when run against a relatively small XML file of 17 MB, the total allocations will amount to 47 MB, and the dirty memory accounts for 77 MB. Now this strikes me as odd, as my code isn't referencing any of the data it is passed.
Is this an error with NSXMLParser, my misunderstanding, or an error with my code?
import Foundation
var input = NSURL(fileURLWithPath: Process.arguments[1])!
class MyDelegate: NSObject, NSXMLParserDelegate {
func parser(parser: NSXMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: NSDictionary) {
}
}
var parser = NSXMLParser(contentsOfURL: input)!
var delegate = MyDelegate()
parser.delegate = delegate
parser.parse()
Documentation
Memory management becomes a heightened concern when you are parsing XML. Processing the XML often requires you to create many objects; you should not allow these objects to accumulate in memory past their span of usefulness. One technique for dealing with these generated objects is for the delegate to create a local autorelease pools at the beginning of each implemented delegation method and release the autorelease pool just before returning. NSXMLParser manages the memory for each object it creates and sends to the delegate. (source)
Update
When using libxml2's sax parser directly, the memory usage stays steady after a few seconds, with usage around 100 MB. Why is NSXMLParser (mostly just a wrapper) using this much memory?
Update 2
NSXMLParser should not be holding on to the data after the delegate has processed them. Most of the structures allocated by NSXMLParser have a ref count of 1 (see screenshot), and thus remain allocated. Manually releasing the memory helps, but that contradicts the memory statement in the documentation and doesn't feel right.
Upvotes: 2
Views: 272
Reputation: 12138
Based on experimentation, I think that NSXMLParser
does employ a global autoreleasepool
. So the autoreleasepool
only manages the whole parse activity, not individual callbacks to the delegate. So the memory pressure builds up while parsing the file, only being released after the complete file has been parsed.
Pseudo-code:
Call function parse
@autoreleasepool {
Call delegate function with NSDictionary {
Bridge NSDictionary to Swift dictionary
Call my Swift delegate with Swift dictionary {
// my code
}
}
}
}
So, both the NSDictionary
and the bridged Swift dictionary stay on the heap until the parse function has finished. To reduce the memory pressure, don't employ a Swift delegate, so there's no Swift dictionary on the heap. Or, stay clear of NSXMLParser
and implement libxml's sax parser and include more autoreleasepool
s.
Upvotes: 1