Alex Z
Alex Z

Reputation: 1867

ICU regex - memory corruption in multithreading usage scenario

I have rather large project that uses ICU regex classes. Basically it might run in single-threaded mode, and in multi-threaded mode. In latter case all threads initialize own internal data (including regexes they use).

Originally project used shared_ptr to RegexPattern class to store regular expression for further use. I identified RegexPattern::matcher() call to be a bottleneck as it involves extra memory allocation to allocate new RegexMatcher class, so I decided to switch to store shared_ptr to RegexMatcher, and just call reset(str) before calling match.

I want to stress again - regexes are not shared between threads.

So it all went fine in single-threaded mode, and app worked slightly faster as I expected. However when I tried to run ~10 processing threads at once ICU library started to give weird results - in debug build some parts of data were partially initialized, some invalid values poped up here and there.

I looked at the ICU code and don't see any static stuff that might cause such behavior.

So the questions are (mostly they cause by the lack of appropriate documentation): 1) Is it valid scenario to store RegexMatcher instead of RegexPattern (RegexMatcher has a member pointing to the pattern being used)? 2) Are there any limitations on multithreading usage of ICU regexes not listed in documentation?

Just to note: my dev platform is Visual C++ 2010, compiling for Win32

Note: I was not able to reproduce such weird behavior in isolated test application that does only regex matching in 10 threads simultaneously, that's why questions are rather open-ended.

Upvotes: 2

Views: 441

Answers (1)

Alex Z
Alex Z

Reputation: 1867

Actually I was wrong - there is a case when single regexp is used from different threads. Obviously it cases issues when using RegexMatcher instead of RegexPattern

Upvotes: 2

Related Questions