Reputation: 1246
I have a program which color codes a returned results set a certain way depending on what the results are. Due to the length of time it takes to color-code the results (currently being done with Regex and RichTextBox.Select + .SelectionColor), I cut off color-coding at 400 results. At around that number it takes about 20 seconds, which is just about max time of what I'd consider reasonable.
To try an improve performance I re-wrote the Regex part to use a Parallel.ForEach
loop to iterate through the MatchCollection
, but the time was about the same (18-19 seconds vs 20)! Is just not a job that lends itself to Parallel programming very well? Should I try something different? Any advice is welcome. Thanks!
PS: Thought it was a bit strange that my CPU utilization never went about 14%, with or without Parallel.ForEach.
Code
MatchCollection startMatches = Regex.Matches(tempRTB.Text, startPattern);
object locker = new object();
System.Threading.Tasks.Parallel.ForEach(startMatches.Cast<Match>(), m =>
{
int i = 0;
foreach (Group g in m.Groups)
{
if (i > 0 && i < 5 && g.Length > 0)
{
tempRTB.Invoke(new Func<bool>(
delegate
{
lock (locker)
{
tempRTB.Select(g.Index, g.Length);
if ((i & 1) == 0) // Even number
tempRTB.SelectionColor = Namespace.Properties.Settings.Default.ValueColor;
else // Odd number
tempRTB.SelectionColor = Namespace.Properties.Settings.Default.AttributeColor;
return true;
}
}));
}
else if (i == 5 && g.Length > 0)
{
var result = tempRTB.Invoke(new Func<string>(
delegate
{
lock (locker)
{
return tempRTB.Text.Substring(g.Index, g.Length);
}
}));
MatchCollection subMatches = Regex.Matches((string)result, pattern);
foreach (Match subMatch in subMatches)
{
int j = 0;
foreach (Group subGroup in subMatch.Groups)
{
if (j > 0 && subGroup.Length > 0)
{
tempRTB.Invoke(new Func<bool>(
delegate
{
lock (locker)
{
tempRTB.Select(g.Index + subGroup.Index, subGroup.Length);
if ((j & 1) == 0) // Even number
tempRTB.SelectionColor = Namespace.Properties.Settings.Default.ValueColor;
else // Odd number
tempRTB.SelectionColor = Namespace.Properties.Settings.Default.AttributeColor;
return true;
}
}));
}
j++;
}
}
}
i++;
}
});
Upvotes: 0
Views: 1574
Reputation: 174309
The most time in your code is most likely spent in the part that actually selects the text in the richtext box and sets the color.
This code is impossible to execute in parallel, because it has to be marshalled to the UI thread - which you do via tempRTB.Invoke
.
Furthermore, you explicitly make sure that the highlighting is not executed in parallel but sequentially by using the lock
statement. This is unnecessary, because all of that code is run on the single UI thread anyway.
You could try to improve your performance by suspending the layouting of your UI while you select and color the text in the RTB:
tempRTB.SuspendLayout();
// your loop
tempRTB.ResumeLayout();
Upvotes: 4
Reputation: 203817
Virtually no aspect of your program is actually able to run in parallel.
The generation of the matches needs to be done sequentially. It can't find the second match until it has already found the first. Parallel.ForEach
will, at best, allow you to process the results of the sequence in parallel, but they are still generated sequentially. This is where the majority of your time consuming work seems to be, and there are no gains there.
On top of that, you aren't really processing the results in parallel either. The majority of code run in the body of your loop is all inside an invoke to the UI thread, which means it's all being run by a single thread.
In short, only a tiny, tiny bit of your program is actually run in parallel, and using parallelization in general adds some overhead; it sounds like you're just barely getting more than that overhead. There isn't really much that you did wrong, the operation just inherently doesn't lend itself to parallelization, unless there is an effective way of breaking up the initial string into several smaller chucks that the regex can parse individually (in parallel).
Upvotes: 5