Reputation: 2180
I am working on a text document parsing application.
The design of the document is as shown in the figure
Here is how the parsing is being done:
Document contains an ArrayList of pages .
Each page has a Map<float, List<Character>>
the float
value contains the yaxis value of a character location and hence a key
and Character contains other informations.
Parsing is done character by character via third party library. Please add comments if more information is required.
Now while parsing I have created two ExecutorService thread pools, one for pages and one for populating map.
I initially create a Document and pass each page to the page parser
as a runnable to ExecutorService
. Which in turn passes an empty map
to text parser.
Text parser checks if the map
has a key value
if yes it adds the character to the list or a new list what ever is necessary.
Problem here is that this task can be done concurrently for all pages to speed up execution. But I am unable to do with this data structure as all threads stuck in between if parsed normally and if done in a synchronized fashion using Collections.synchronizedMap
, takes a lot of time.
Also, I am maintaining two different lists of Future objects to check if the threads finished.
Kindly provide valuable suggestions for improvements and concurrent execution to speed up the execution.
Upvotes: 0
Views: 58
Reputation: 18148
If each page has its own Map<float, List<Character>>
, then never have more than one thread processing a single page - then you won't need to synchronize access to the Map
or use a concurrent Map
implementation. You can statically partition the pages among your workers as JB Nizet suggests in the comments; another option would be to place all pages in a ConcurrentLinkedQueue and have the workers poll
the queue for pages to parse, terminating when the queue is empty (poll
returns null
). Either way, you'll only need one ExecutorService
, since each worker is responsible for both parsing and map population.
Upvotes: 2