Reputation: 886
I have a big XML file, larger than 100mb, and I want to check if the structure of this file is valid.
I can try to load this file with DOMDocument
; For example, I can read it with the PHP XML parser, which "lets you parse, but not validate, XML documents".
Is there any way to do this without fully loading the XML file into memory?
Upvotes: 3
Views: 1945
Reputation: 163458
Firstly, you don't say what kind of schema you are using for validation: DTD, XSD, RelaxNG?
Secondly you mention PHP but you don't say whether the solution has to be based on PHP. Could you, for example, use Java?
Generally speaking, validating an XML document against a schema is a streamable operation, it does not require building a tree representation of the XML document in memory. Finding a streaming validator that works in your environment should not be hard, but we need to know what the environment is (and what schema language you are using).
Upvotes: 4
Reputation: 8249
I think you need to look into the XMLReader class. More specifically, XMLReader::setSchema.
Upvotes: 3
Reputation: 17477
If all you want to do is check if the XML structure is valid, you can use PHP's XML Parser. It will not validate the document against a DTD, which is what it means by it will not validate.
All of these error codes can be returned in the event the XML structure is found to be invalid while parsing it.
Upvotes: 0
Reputation: 355
Think about what you're saying. You want to do operations on data that is not in memory. That doesn't make sense at all... it will eventually have to be in memory if you want to reference it from operations.
If you don't want to load the data in memory all at once, you could do a divide and conquer approach. If the file is incredibly large, you could run a map reduce job in multiple processes, but this would not decrease the amount of memory used.
Upvotes: 0