Dejell
Dejell

Reputation: 14317

String max size for 722MB xml file

I have a ByteArrayOutputStream which holds a byte representation of an XML with 750MB size.

I need to convert it to String.

I wrote:

ByteArrayOutputStream xmlArchive = ...
String xmlAsString = xmlArchive.toString(UTF8);

However although I am using 4GB of heap size I get java.lang.OutOfMemoryError: Java heap space

What is wrong? How can I know which heap size to use? I am using JDK64 bit

UPDATE

I need it as String in order to remove all the characters before "<?xml"

Currently my code is:

String xmlAsString = xmlArchive.toString(UTF8);
int xmlBegin = xmlAsString.indexOf("<?xml");
if (xmlBegin >0){
        return xmlAsString.substring(xmlBegin);
}
return xmlAsString;

I then convert it again to byte array.

UPDATED 2 The ByteArrayOutputStream is written like this:

HttpMethod method ..
InputStream response = method.getResponseBodyAsStream();
byte[] buf = new byte[5000];
while ( (len=response.read(buf)) != -1) {
    output.write(buf, 0, len);
}

len is from the header of the response Content-Length

Upvotes: 1

Views: 311

Answers (2)

Cruncher
Cruncher

Reputation: 7812

Expanding on Jamie Cockburn's answer:

To fill in his while loop to match your expected behaviour:

byte[] buf = line.getBytes(StandardCharsets.UTF_8.name());
output.write(buf, 0, buf.length);

Upvotes: 1

Jamie Cockburn
Jamie Cockburn

Reputation: 7555

You could use the Scanner class:

Scanner scanner = new Scanner(response, StandardCharsets.UTF_8.name());

// skip to "<?xml"
scanner.skip(".*?(?=<\\?xml)");

// process rest of stream
while (scanner.hasNextLine()) {
    String line = scanner.nextLine();
    // Do something with line
}
scanner.close();

Upvotes: 2

Related Questions