Reputation: 26874
Since ByteArrayInputStream
is limited to 2GB, is there any alternate solution that allows me to store the whole contents of a 2.3GB (and possibly larger) file into an InputStream
to be read by Stax2?
Current code:
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(in); //ByteArrayInputStream????
try
{
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema(new StreamSource(schemaInputStream));
Validator validator = schema.newValidator();
validator.validate(new StAXSource(xmlStreamReader));
}
finally
{
xmlStreamReader.close();
}
For performance tuning, variable in
must not come from disk. I have plenties of RAM.
Upvotes: 8
Views: 3278
Reputation: 100013
Use NIO to read the file into a gigantic ByteBuffer, and then create a stream class that reads the ByteBuffer. There are several such floating around in open sources.
Upvotes: 3
Reputation: 109547
You can use memory writing the data compressed to a
ByteArrayOutputStream baos = new ByteArrayOutputStream
... new GZIPOutputStream(baos));
byte[] bytes = baos.toByteArray(); // < 100 MB?
ByteArrayInputStream ....
And then later wrap the input stream in a GZIPInputStream.
Still a minor slow down, but should be ideal for XML.
Upvotes: -1
Reputation: 17707
The whole point of StAX2 is that you do not need to read the file in to memory. You can just supply the source, and let the StAX StreamReader pull the data as it needs to.
What additional constraints do you have that you are not showing in your question?
If you have lots of memory, and you want to get good performance, just wrap your InputStream with a large byte buffer, and let the buffer do the buffering for you:
// 4 meg buffer on the stream
InputStream buffered = new BufferedInputStream(schemaInputStream, 1024 * 1024 * 4);
An alternative to solving this in Java is to create a RAMDisk, and to store the file on that, which would remove the problem from Java, where your basic limitation is that you can only have just less than Integer.MAX_VALUE
values in a single array.
Upvotes: 5
Reputation: 20520
If you have huge quantities of memory, you really won't get any performance improvement anyway. It's only getting read in once either way, and the disk cache will ensure it gets done optimally. Just use a disk-based input stream.
Upvotes: 0