Reputation: 1855
I am writing code for text processing, and things go a lot faster if I convert the strings to integers first. To do this I made a Dictionary class, where every time I see a new string, I give it an index, and keep two maps, one from string to int, and one from int to string, so I can easily look up both ways. Here's the code:
class Dictionary {
private Map<String, Integer> map;
private Map<Integer, String> reverse_map;
private int nextIndex;
public Dictionary() {
map = new HashMap<String, Integer>();
reverse_map = new HashMap<Integer, String>();
nextIndex = 1;
}
public int getIndex(String string) {
if (!map.containsKey(string)) {
map.put(string, nextIndex);
reverse_map.put(nextIndex, string);
nextIndex++;
}
return map.get(string);
}
public String getString(int index) {
// getIndex is always called first, so we don't need to check anything
return reverse_map.get(index);
}
}
This has been working fine for me in my single-threaded code. But now I want to give this multiple threads to speed it up more, and I'm not sure how to do it. I thought of using ConcurrentHashMap, but I'm not sure that putIfAbsent
will guarantee that I don't use an index twice. I didn't want to use Collections.synchronizedMap, because this dictionary is accessed really frequently across the threads and so I probably wouldn't be much better off than with a single thread, because it blocks on every read and write. Is there a way to make this work?
Upvotes: 1
Views: 1067
Reputation: 23265
The simplest thing is to just label your two methods (getIndex
and getString
) synchronized
. See what kind of speedup you get. Maybe it will be enough.
To use ConcurrentHashMap
, you might try this:
private AtomicInteger nextIndex;
public int getIndex(String string) {
Integer n = map.get(string);
if (n == null) {
int idx = nextIndex.getAndIncrement();
n = map.putIfAbsent(string, idx);
if (n != null) return n;
reverse_map.put(idx, string);
return idx;
}
return n;
}
This may occasionally skip an index if two threads insert the same string at the same time, but it won't be often.
Upvotes: 1
Reputation: 200206
Your issue with a concurrent solution is atomicity. These are my thoughts:
private final ConcurrentMap<String, Integer> map = new ConcurrentHashMap<String, Integer>();
private final ConcurrentMap<Integer, String> reverse_map = new ConcurrentHashMap<Integer, String>();
private final AtomicInteger nextIndex = new AtomicInteger(1);
public int getIndex(String string) {
Integer i = map.get(string);
if (i == null) {
final Integer newI = nextIndex.getAndIncrement();
i = map.putIfAbsent(string, newI);
if (i == null) {
reverse_map.put(newI, string);
return newI;
}
}
return i;
}
This has a very benign failure mode: some indices will be left unused.
Notice that I put to the second map unconditionally since at this point I know I'm in charge for the string at hand.
Upvotes: 2