mattg
mattg

Reputation: 1855

Concurrent two-way map in java

I am writing code for text processing, and things go a lot faster if I convert the strings to integers first. To do this I made a Dictionary class, where every time I see a new string, I give it an index, and keep two maps, one from string to int, and one from int to string, so I can easily look up both ways. Here's the code:

class Dictionary {
    private Map<String, Integer> map;
    private Map<Integer, String> reverse_map;
    private int nextIndex;

    public Dictionary() {
        map = new HashMap<String, Integer>();
        reverse_map = new HashMap<Integer, String>();
        nextIndex = 1;
    }

    public int getIndex(String string) {
        if (!map.containsKey(string)) {
            map.put(string, nextIndex);
            reverse_map.put(nextIndex, string);
            nextIndex++;
        }
        return map.get(string);
    }

    public String getString(int index) {
        // getIndex is always called first, so we don't need to check anything
        return reverse_map.get(index);
    }
}

This has been working fine for me in my single-threaded code. But now I want to give this multiple threads to speed it up more, and I'm not sure how to do it. I thought of using ConcurrentHashMap, but I'm not sure that putIfAbsent will guarantee that I don't use an index twice. I didn't want to use Collections.synchronizedMap, because this dictionary is accessed really frequently across the threads and so I probably wouldn't be much better off than with a single thread, because it blocks on every read and write. Is there a way to make this work?

Upvotes: 1

Views: 1067

Answers (2)

Keith Randall
Keith Randall

Reputation: 23265

The simplest thing is to just label your two methods (getIndex and getString) synchronized. See what kind of speedup you get. Maybe it will be enough.

To use ConcurrentHashMap, you might try this:

private AtomicInteger nextIndex;
public int getIndex(String string) {
    Integer n = map.get(string);
    if (n == null) {
        int idx = nextIndex.getAndIncrement();
        n = map.putIfAbsent(string, idx);
        if (n != null) return n;
        reverse_map.put(idx, string);
        return idx;
    }
    return n;
}

This may occasionally skip an index if two threads insert the same string at the same time, but it won't be often.

Upvotes: 1

Marko Topolnik
Marko Topolnik

Reputation: 200206

Your issue with a concurrent solution is atomicity. These are my thoughts:

private final ConcurrentMap<String, Integer> map = new ConcurrentHashMap<String, Integer>();
private final ConcurrentMap<Integer, String> reverse_map = new ConcurrentHashMap<Integer, String>();
private final AtomicInteger nextIndex = new AtomicInteger(1);

public int getIndex(String string) {
  Integer i = map.get(string);
  if (i == null) {
    final Integer newI = nextIndex.getAndIncrement();
    i = map.putIfAbsent(string, newI);
    if (i == null) {
      reverse_map.put(newI, string);
      return newI;
    }
  }
  return i;
}

This has a very benign failure mode: some indices will be left unused.

Notice that I put to the second map unconditionally since at this point I know I'm in charge for the string at hand.

Upvotes: 2

Related Questions