guest86
guest86

Reputation: 2956

Java - determine size of xml document

I have a simple code that gets xml file from given URL:

DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(link);

that code returns xml document (org.w3c.dom.Document). I just need to get size of resulting xml document. Is there any elegant way to do it, WITHOUT involving third-party jars?

P.S. size in KB, or MB, not number of nods

Upvotes: 0

Views: 8101

Answers (4)

UVM
UVM

Reputation: 9914

You can do this way:

long start = Runtime.getRuntime().freeMemory();

Construct your XML Document object. Then call again the above method.

Document ocument = parser.getDocument();

long now = Runtime.getRuntime().freeMemory();

System.out.println(" size of Document "+(now - start) );

Upvotes: 0

vanje
vanje

Reputation: 10383

First naive version: Load the file into a local buffer. Then you know how long is your input. Then parse the XML out of the buffer:

URL url = new URL("...");
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream buffer1 = new ByteArrayOutputStream();
int c = 0;
while((c = in.read()) >= 0) {
  buffer1.write(c);
}

System.out.println(String.format("Length in Bytes: %d", 
    buffer1.toByteArray().length));

ByteArrayInputStream buffer2 = new ByteArrayInputStream(buffer1.toByteArray());

Document doc = DocumentBuilderFactory.newInstance()
    .newDocumentBuilder().parse(buffer2);

Drawback is the additional buffer in RAM.

Second more elegant version: Wrap the input stream with a custom java.io.FilterInputStream counting the bytes streaming through it:

URL url = new URL("...");
CountInputStream in = new CountInputStream(url.openStream());
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
System.out.println(String.format("Bytes: %d", in.getCount()));

Here is the CountInputStream. All read() methods are overwritten to delegate to the super class and count the resulting bytes:

public class CountInputStream extends FilterInputStream {

  private long count = 0L;

  public CountInputStream(InputStream in) {
    super(in);
  }

  public int read() throws IOException {
    final int c = super.read();
    if(c >= 0) {
      count++;
    }
    return c;
  }

  public int read(byte[] b, int off, int len) throws IOException {
    final int bytesRead = super.read(b, off, len);
    if(bytesRead > 0) {
      count += bytesRead;
    }
    return bytesRead;
  }

  public int read(byte[] b) throws IOException {
    final int bytesRead = super.read(b);
    if(bytesRead > 0) {
      count += bytesRead;
    }
    return bytesRead;
  }

  public long getCount() {
    return count;
  }
}

Upvotes: 4

Phebus40
Phebus40

Reputation: 173

Maybe this :

document.getTextContent().getBytes().length;

Upvotes: -1

Nikhil Dabas
Nikhil Dabas

Reputation: 2393

Once you've parsed an XML file into a DOM tree, the source document (as a string of characters) does not exist anymore. You just have a tree of nodes built from that document - so it's no longer possible to accurately determine the size of the source document from a DOM document.

You could transform the DOM document back into an XML file using the identity transform; but that's a really round-about way of getting the size, and it would still not be an exact match for the source document size.

For what you're trying to do, the best way would be to download the document yourself, take a note of the size, and then pass it to the DocumentBuilder.parse method using an InputStream.

Upvotes: 0

Related Questions