Freek8
Freek8

Reputation: 704

Threads in java not running concurrently

I'm having a problem with multi-threading in Java. I need to compare a large list of names to itself (to find near-duplicates).

I've split up the work into 4 different threads, each comparing 1/4 of the list to the complete list. I use the same class for all 4 threads.

When I look at the thread monitor I see that they are not really running concurrently, they are active one after another.

what could be the problem?

This is the run-method of my thread-class:

@Override
    public void run() {
        try {
            s = settings.conn.createStatement();
            JaroWinklerDistance jw = JaroWinklerDistance.JARO_WINKLER_DISTANCE;

        for (int i = 0; i < names.size(); i++) {
            for (int j = 0; j < allNames.size(); j++) {
                if (j % 250 == 0) {
                }
                double proximity = jw.proximity(names.get(i), allNames.get(j));
                if (proximity > Double.parseDouble(settings.properties.getProperty("distanceTreshold")) && proximity < 1.00) {
                    if (names.get(i).length() > allNames.get(j).length()) {
                        substituteName(allNames.get(j), names.get(i));
                        allNames.remove(allNames.get(j));
                    } else {
                        substituteName(names.get(i), allNames.get(j));
                        names.remove(names.get(i));
                        break;
                    }
                }
            }
        }
    } catch (SQLException ex) {
        Exceptions.printStackTrace(ex);
    }
}

The substituteName-method executes an SQL-query that updates the records.

The threads are created as follows:

settings.getAllNames();
        int size = settings.allNames.size();
        int rest = size % 4;
        int groupSize = (size-rest) / 4;

        GroupNormalizer a = new GroupNormalizer(settings.allNames, new ArrayList<String>(settings.allNames.subList(0, groupSize)));
        GroupNormalizer b = new GroupNormalizer(settings.allNames, new ArrayList<String>(settings.allNames.subList(groupSize, (groupSize*2))));
        GroupNormalizer c = new GroupNormalizer(settings.allNames, new ArrayList<String>(settings.allNames.subList((groupSize * 2), (groupSize * 3))));
        GroupNormalizer d = new GroupNormalizer(settings.allNames, new ArrayList<String>(settings.allNames.subList((groupSize * 3), (groupSize*4 + rest))));
        a.start();
        b.start();
        c.start();
        d.start();

EDIT: all 4 threads alternate a lot between running and monitor (blocked)-status

Upvotes: 2

Views: 1704

Answers (4)

dalelane
dalelane

Reputation: 2765

There is a known issue with using Double.parseDouble in Java from multiple threads concurrently. The method it uses internally to do the parsing is synchronized, so if you have a lot of threads calling it at the same time, the threads end up blocking.

This should be fixed in Java 8.

(see Java bug report JI-9004591 - "Monitor contention when calling Double.parseDouble from multiple threads" - https://bugs.java.com/bugdatabase/view_bug?bug_id=7032154 )

I suspect this is the reason why the change made in the accepted answer (moving the Double.parseDouble out) improved performance.

Upvotes: 1

light_303
light_303

Reputation: 2111

hmm it look like this line is causing synchronization lockup:

if (proximity > Double.parseDouble(settings.properties.getProperty("distanceTreshold")) && proximity < 1.00)

try to pull the Double.parseDouble out of the loop since everything in there looks kind of constant to me.

Seems like the settings object is blocking ob access and in this way slowing you down.

Also it looks like you are accessing a DB during your claculation (catching SQLEx), this will slow you down by a very large factor. Try to separate read and write from the claculation process.

Upvotes: 2

aviad
aviad

Reputation: 8278

Try ForkJoin.

Upvotes: 1

d1e
d1e

Reputation: 6442

Executor Framework (thread pools) to the rescue!

Thread pools manage a pool of worker threads. The thread pools contains a work queue which holds tasks waiting to get executed.

Upvotes: 3

Related Questions