Reputation: 375
The following code crashes after sometime due to high memory usage (I open taskmanager and its used memory keeps raising). But I cant see any memory leak except for garbage collection not doing its job. Any suggestions?
//Load a list of regex
//Load a list of phrases
//Open a output file
foreach (string regexString in regexList)
{
int num = 0;
Regex regex = new Regex(regexString, RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (string phrase in phraseList)
num += regex.Matches(phrase).Count;
if (num > 0)
{
output.WriteLine(String.Join(" ", num, phrase));
output.Flush();
}
}
EDIT:
Complete Code: http://pastebin.com/0SQYn44z
EDIT2:
I found and posted the solution (foreach loop) Thanks for everyone that tried to help in anyway.
Upvotes: 4
Views: 4080
Reputation: 75222
Let's see if I've got this right. You have a large list of keywords, and for every permutation of two of those words you create a new string by joining them together with a space in the middle, plus another string with the order of the words reversed (which is unnecessary, BTW). Finally, you create a Regex object from each of the strings.
Having done that, you start iterating over your list of FeedItem objects, taking two attributes of each and forming them into a string like you did with the keywords, and try to match that string with each of the two regexes. In short, I think you should hold off on the finger-pointing for the moment and concentrate on refactoring your code. In fact, I advise you to forget about regexes for now. You aren't using any of their power, and they're distracting you from your real problems.
Upvotes: 1
Reputation: 4992
After seeing your code, I can already recommend not using two, but one combined Regex. Something like:
Regex regex = new Regex(
@"\b(?:" + s1 + " " + s2 + "|" + s2 + " " + s1 + @")\b",
RegexOptions.IgnoreCase);
Note that the series of +
operators there is converted into a single String.Concat(params string[])
call. With only 2 to 4 parts, it wouldn't even create the temporary array because there are more specialized Concat
overloads for that case. Join
always uses a temporary array and also needs to take care of the separator string, even if it's empty.
Another thing you might want to keep in mind is that running your program in debug mode will differ quite a lot from running it in release mode. For example that keywords
list will remain live until the function returns. I recommend explicitly restricting its lifetime and also cleaning up the other objects as soon as possible. Maybe something like the following:
string[] keywordArray;
using (StreamReader keywordsFile = new StreamReader("keywords.txt"))
{
List<string> keywords = new List<string>();
string keyword;
while ((keyword = keywordsFile.ReadLine()) != null)
keywords.Add(keyword);
keywordArray = keywords.ToArray();
}
Or skip it altogether:
string[] keywordArray = File.ReadAllLines("keywords.txt"); // internally uses above code
However, somehow I have the feeling loading your huge feeds file as a whole into memory is a bad idea. It would require a lot less memory if you could deserialize one FeedItem
at a time and return it from an iterator method. Then all you need to do is restructure the algorithm to have the loop over feed items as outer-most loop.
Upvotes: 1
Reputation: 22133
You use some types that implement IDisposable without calling Dipose():
Wrap the use of those types in using statements and see if that improves thing for you.
Upvotes: 1
Reputation: 19027
I can't tell from your example, but it is possible that the RegexOptions.Compiled
flag causes the problem. From msdn:
Compiled specifies that the regular expression is compiled to an assembly. This yields faster execution but increases startup time. This value should not be assigned to the Options property when calling the CompileToAssembly method.
If a regex is compiled to an assembly, then the generated code can not be unloaded until you restart the application as .Net does not unload assemblies.
This means that if you have a lot of different regular expressions, and you don't reuse them, a compiled Regex is usualy not a good idea.
Upvotes: 6
Reputation: 14079
Chances are that your regex uses excessive backtracking resulting in a long execute time and high memory usage
Can you post the regex and the input so I can check for sure?
Upvotes: 2