javamultithreadingoptimizationbinary-tree

Reputation: 54242

Traversing a Binary Tree with multiple threads

So I'm working on a speed contest in Java. I have (number of processors) threads doing work, and they all need to add to a binary tree. Originally I just used a synchronized add method, but I wanted to make it so threads could follow each other through the tree (each thread only has the lock on the object it's accessing). Unfortunately, even for a very large file (48,000 lines), my new binary tree is slower than the old one. I assume this is because I'm getting and releasing a lock every time I move in the tree. Is this the best way to do this or is there a better way?

Each node has a ReentrantLock named lock, and getLock() and releaseLock() just call lock.lock() and lock.unlock();

My code:

public void add(String sortedWord, String word) {

    synchronized(this){
        if (head == null) {
            head = new TreeNode(sortedWord, word);
            return;
        }
        head.getLock();
    }

    TreeNode current = head, previous = null;
    while (current != null) {

        // If this is an anagram of another word in the list..
        if (current.getSortedWord().equals(sortedWord)) {
            current.add(word);
            current.releaseLock();
            return;
        }
        // New word is less than current word
        else if (current.compareTo(sortedWord) > 0) {
            previous = current;
            current = current.getLeft();
            if(current != null){
                current.getLock();
                previous.releaseLock();
            }
        }
        // New word greater than current word
        else {
            previous = current;
            current = current.getRight();
            if(current != null){
                current.getLock();
                previous.releaseLock();
            }
        }
    }

    if (previous.compareTo(sortedWord) > 0) {
        previous.setLeft(sortedWord, word);
    }
    else {
        previous.setRight(sortedWord, word);
    }
    previous.releaseLock();
}

EDIT: Just to clarify, my code is structured like this: The main thread reads input from a file and adds the words to a queue, each worker thread pull words from the queue and does some work (including sorting them and adding them to the binary tree).

Upvotes: 2

Answers (8)

Neil

Reputation: 1938

You seem to have implemented a Binary Search Tree, not a B-Tree.

Anyway, have you considered using a ConcurrentSkipListMap? This is an ordered data structure (introduced in Java 6), which should have good concurrency.

Upvotes: 1

Erik

Reputation: 2051

I would actually start looking at the use of compare() and equals() and see if something can be improved there. You might wrap you String object in another class with an different, optimized for your usecase, compare() method. For instance, consider using hashCode() instead of equals(). The hashcode is cached so future calls will be that much faster. Consider interning the strings. I don't know if the vm will accept that many strings but it's worth checking out.

(this was going to be a comment to an answer but got too wordy).

When reading the nodes you need to get a read-lock for each node as you reach it. If you read-lock the whole tree then you gain nothing. Once you reach the node you want to modify, you release the read lock for that node and try to acquire the write lock. Code would be something like:

TreeNode current; // add a ReentrantReadWriteLock to each node.

// enter the current node:
current.getLock().readLock().lock();
if (isTheRightPlace(current) {
current.getLock().readLock().unlock();
current.getLock().writeLock().lock(); // NB: getLock returns a ConcurrentRWLock
// do stuff then release lock
current.getLock().writeLock().unlock();
} else {
current.getLock().readLock().unlock();
}

Upvotes: 3

Stephan Eggermont

Reputation: 15907

Another thing. There definitely is no place for a binary tree in performance critical code. The cacheing behaviour will kill all performance. It should have a much larger fan out (one cache line) [edit] With a binary tree you access too much non-contiguous memory. Take a look at the material on Judy trees.

And you probably want to start with a radix of at least one character before starting the tree.

And do the compare on an int key instead of a string first.

And perhaps look at tries

And getting rid of all the threads and synchronization. Just try to make the problem memory access bound

[edit] I would do this a bit different. I would use a thread for each first character of the string, and give them their own BTree (or perhaps a Trie). I'd put a non-blocking work queue in each thread and fill them based on the first character of the string. You can get even more performance by presorting the add queue and doing a merge sort into the BTree. In the BTree, I'd use int keys representing the first 4 characters, only refering to the strings in the leaf pages.

In a speed contest, you hope to be memory access bound, and therefore have no use for threads. If not, you're still doing too much processing per string.

Upvotes: 3

Frunsi

Reputation: 7147

You may try using an upgradeable read/write-lock (maybe its called an upgradeable shared lock or the like, I do not know what Java provides): use a single RWLock for the whole tree. Before traversing the B-Tree, you acquire the read (shared) lock and you release it when done (one acquire and one release in the add method, not more).

At the point where you have to modify the B-Tree, you acquire the write (exclusive) lock (or "upgrade" from read to write lock), insert the node and downgrade to read (shared) lock.

With this technique the synchronization for checking and inserting the head node can also be removed!

It should look somehow like this:

public void add(String sortedWord, String word) {

    lock.read();

    if (head == null) {
        lock.upgrade();
        head = new TreeNode(sortedWord, word);
        lock.downgrade();
        lock.unlock();
        return;
    }

    TreeNode current = head, previous = null;
    while (current != null) {

            if (current.getSortedWord().equals(sortedWord)) {
                    lock.upgrade();
                    current.add(word);
                    lock.downgrade();
                    lock.unlock();
                    return;
            }

            .. more tree traversal, do not touch the lock here ..
            ...

    }

    if (previous.compareTo(sortedWord) > 0) {
        lock.upgrade();
        previous.setLeft(sortedWord, word);
        lock.downgrade();
    }
    else {
        lock.upgrade();
        previous.setRight(sortedWord, word);
        lock.downgrade();
    }

    lock.unlock();
}

Unfortunately, after some googling I could not find a suitable "ugradeable" rwlock for Java. "class ReentrantReadWriteLock" is not upgradeable, however, instead of upgrade you can unlock read, then lock write, and (very important): re-check the condition that lead to these lines again (e.g. if( current.getSortedWord().equals(sortedWord) ) {...}). This is important, because another thread may have changed things between read unlock and write lock.

for details check this question and its answers

In the end the traversal of the B-tree will run in parallel. Only when a target node is found, the thread acquires an exclusive lock (and other threads will block only for the time of the insertion).

Upvotes: 2

Mike Dunlavey

Reputation: 40659

I've got a dumb question: since you're reading and modifying a file, you're going to be totally limited by how fast the read/write head can move around and the disk can rotate. So what good is it to use threads and processors? The disc can't do two things at once.

Or is this all in RAM?

ADDED: OK, It's not clear to me how much parallelism can help you here (some, maybe), but regardless, what I would suggest is squeezing every cycle out of each thread that you can. This is what I'm talking about. For example, I wonder if innocent-looking sleeper code like those calls to "get" and "compare" methods are taking a more of a % of time than you might expect. If they are, you might be able to do each of them once rather than 2 or 3 times - that sort of thing.

Upvotes: 0

erickson

Reputation: 269667

Locking and unlocking is overhead, and the more you do it, the slower your program will be.

On the other hand, decomposing a task and running portions in parallel will make your program complete more quickly.

Where the "break-even" point lies is highly-dependent on the amount of contention for a particular lock in your program, and the system architecture on which the program is run. If there is little contention (as there appears to be in this program) and many processors, this might be a good approach. However, as the number of threads decreases, the overhead will dominate and a concurrent program will be slower. You have to profile your program on the target platform to determine this.

Another option to consider is a non-locking approach using immutable structures. Rather than modifying a list, for example, you could append the old (linked) list to a new node, then with a compareAndSet operation on an AtomicReference, ensure that you won the data race to set the words collection in current tree node. If not, try again. You could use AtomicReferences for the left and right children in your tree nodes too. Whether this is faster or not, again, would have to be tested on your target platform.

Upvotes: 2

Gabriel Reid

Reputation: 2536

I would say that the doing it this way is not the way to go, without even taking the synchronization performance issues into account.

The fact that this implementation is slower than the original fully synchronized version may be a problem, but a bigger problem is that the locking in this implementation is not at all robust.

Imagine, for example, that you pass null in for sortedWord; this will result in a NullPointerException being thrown, which will mean you end up with holding onto the lock in the current thread, and therefore leaving your data structure in an inconsistent state. On the other hand, if you just synchronize this method, you don't have to worry about such things. Considering the synchronized version is faster as well, it's an easy choice to make.

Upvotes: 1

pmr

Reputation: 59811

Considering one dataset per line, 48k lines isn't all that much and you can only have wild guesses as to how your operating system and the virtual machine are going to mangle you file IO to make it as fast as possible.

Trying to use a producer/consumer paradigm can be problematically here as you have to balance the overhead of locks vs. the actual amount of IO carefully. You might get better performance if you just try to improve the way you do the File IO (consider something like mmap()).

Upvotes: 1

Traversing a Binary Tree with multiple threads

Answers (8)

Related Questions