Reputation: 1867
I have rather large project that uses ICU regex classes. Basically it might run in single-threaded mode, and in multi-threaded mode. In latter case all threads initialize own internal data (including regexes they use).
Originally project used shared_ptr to RegexPattern class to store regular expression for further use. I identified RegexPattern::matcher() call to be a bottleneck as it involves extra memory allocation to allocate new RegexMatcher class, so I decided to switch to store shared_ptr to RegexMatcher, and just call reset(str) before calling match.
I want to stress again - regexes are not shared between threads.
So it all went fine in single-threaded mode, and app worked slightly faster as I expected. However when I tried to run ~10 processing threads at once ICU library started to give weird results - in debug build some parts of data were partially initialized, some invalid values poped up here and there.
I looked at the ICU code and don't see any static stuff that might cause such behavior.
So the questions are (mostly they cause by the lack of appropriate documentation): 1) Is it valid scenario to store RegexMatcher instead of RegexPattern (RegexMatcher has a member pointing to the pattern being used)? 2) Are there any limitations on multithreading usage of ICU regexes not listed in documentation?
Just to note: my dev platform is Visual C++ 2010, compiling for Win32
Note: I was not able to reproduce such weird behavior in isolated test application that does only regex matching in 10 threads simultaneously, that's why questions are rather open-ended.
Upvotes: 2
Views: 441
Reputation: 1867
Actually I was wrong - there is a case when single regexp is used from different threads. Obviously it cases issues when using RegexMatcher instead of RegexPattern
Upvotes: 2