Sanyam Goel
Sanyam Goel

Reputation: 2180

Unable to figure out the correct data structure and correct approach in this scenario of parsing text

I am working on a text document parsing application.

The design of the document is as shown in the figure figure

Here is how the parsing is being done:

  1. Document contains an ArrayList of pages .

  2. Each page has a Map<float, List<Character>>

the float value contains the yaxis value of a character location and hence a key and Character contains other informations.

Parsing is done character by character via third party library. Please add comments if more information is required.

Now while parsing I have created two ExecutorService thread pools, one for pages and one for populating map.

I initially create a Document and pass each page to the page parser as a runnable to ExecutorService. Which in turn passes an empty map to text parser.

Text parser checks if the map has a key value if yes it adds the character to the list or a new list what ever is necessary.

Problem here is that this task can be done concurrently for all pages to speed up execution. But I am unable to do with this data structure as all threads stuck in between if parsed normally and if done in a synchronized fashion using Collections.synchronizedMap , takes a lot of time.

Also, I am maintaining two different lists of Future objects to check if the threads finished.

Kindly provide valuable suggestions for improvements and concurrent execution to speed up the execution.

Upvotes: 0

Views: 58

Answers (1)

Zim-Zam O&#39;Pootertoot
Zim-Zam O&#39;Pootertoot

Reputation: 18148

If each page has its own Map<float, List<Character>>, then never have more than one thread processing a single page - then you won't need to synchronize access to the Map or use a concurrent Map implementation. You can statically partition the pages among your workers as JB Nizet suggests in the comments; another option would be to place all pages in a ConcurrentLinkedQueue and have the workers poll the queue for pages to parse, terminating when the queue is empty (poll returns null). Either way, you'll only need one ExecutorService, since each worker is responsible for both parsing and map population.

Upvotes: 2

Related Questions