Maxim Veksler
Maxim Veksler

Reputation: 30222

Which implementation is better: Cache based on WeakHashMap or cache based on ThreadLocal?

I'm having hard time to decide between the following two implementations. I want to cache the javax.xml.parsers.DocumentBuilder object, per thread. My main concern is runtime performance - Hench I would be happy to avoid as much GC as possible. Memory is not an issue.

I've written two POC implementations, and would be happy to hear from the community PROS/CONS regarding each one.

Thanks for the help guys.

Option #1 - WeakHashMap

import java.io.IOException;
import java.io.StringReader;
import java.util.WeakHashMap;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


public class DocumentBuilder_WeakHashMap {
    private static final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    private static final WeakHashMap<Thread, DocumentBuilder> CACHE = new WeakHashMap<Thread, DocumentBuilder>();

    public static Document documentFromXMLString(String xml) throws SAXException, IOException, ParserConfigurationException {
        DocumentBuilder builder = CACHE.get(Thread.currentThread());
        if(builder == null) {
            builder = factory.newDocumentBuilder();
            CACHE.put(Thread.currentThread(), builder);
        }

        return builder.parse(new InputSource(new StringReader(xml)));
    }

}

Option #2 - ThreadLocal

import java.io.IOException;
import java.io.StringReader;
import java.lang.ref.WeakReference;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


public class DocumentBuilder_ThreadLocal {
    private static final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    private static final ThreadLocal<WeakReference<DocumentBuilder>> CACHE = 
        new ThreadLocal<WeakReference<DocumentBuilder>>() {
            @Override 
            protected WeakReference<DocumentBuilder> initialValue() {
                try {
                    return new WeakReference<DocumentBuilder>(factory.newDocumentBuilder());
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }
        };

    public static Document documentFromXMLString(String xml) throws ParserConfigurationException, SAXException, IOException {
        WeakReference<DocumentBuilder> builderWeakReference = CACHE.get();
        DocumentBuilder builder = builderWeakReference.get();

        if(builder == null) {
            builder = factory.newDocumentBuilder();
            CACHE.set(new WeakReference<DocumentBuilder>(builder));
        }

        return builder.parse(new InputSource(new StringReader(xml)));
    }
}

They both do the same thing (expose documentFromXMLString() to the outside world) so which one would you use?

Thank you, Maxim.

Upvotes: 3

Views: 3437

Answers (3)

rix0rrr
rix0rrr

Reputation: 10276

BEWARE!

ThreadLocal will retain an indefinite reference to the DocumentBuilder, which contains a reference to the latest XML documents parsed by that thread's DocumentBuilder.

This has a couple of consequences, which might be considered memory leaks:

  • If the JAXP implementation is loaded in a Web Application (say, Xerces or Oracle's xmlparser2.jar), this retained reference to the DocumentBuilder will cause all classes of your web application to leak upon undeploy, eventually leading to an OutOfMemoryError: PermGenSpace! (Google around for more info on this topic)
  • If the latest XML document parsed by a DocumentBuilder is large, it will keep taking up memory until a new XML document is parsed on that thread. If you have long-running threads in a thread pool (such as in a J2EE container), this might be an issue, especially if a lot of large documents need to be parsed. Yes, eventually the memory will be released, but you might run out of usable memory before that happens and the GC will not be able to clean up the XML document while a reference to the DocumentBuilder exists.

Decide if this is relevant to you or not...

Upvotes: 4

Hardcoded
Hardcoded

Reputation: 6494

The WeakHashMap alone will fail, because it is not thread safe:
"Like most collection classes, this class is not synchronized."
(3rd paragraph at the JavaDoc)

Since sychronization will take time and Collections.synchronizedMap won't scale very well, you should stick with ThreadLocal.

Upvotes: 3

pgras
pgras

Reputation: 12780

The ThreadLocal solution is better as long as you don't use the weakreference but rather use directly a ThreadLocal<DocumentBuilder>. Access to the ThreadLocal value is faster because the thread directly references an array containing all ThreadLocal values and it has just to compute the index in this array to do the lookup. Look at the ThreadLocal source to see why the index computation is fast (int index = hash & values.mask;)

Upvotes: 6

Related Questions