Reputation: 6444
I am trying to use C#
to split a SQL script which contains regions by using Regex.Split()
which I can't seem to get the pattern for - I really struggle with the concept of Regex and find it completely bewildering in most circumstances, although I do understand it to be the best solution to achieve the following.
Input string (which is 100'000* the below hence the sluggishness of my method)
--#region someregioncomment
aaaa
bbbb
--#endregion
Where each return is \r\n
.
Output Dictionary<string, string>
Key: --#region someregioncomment
Value: aaaa\r\nbbbb
At the moment I am doing this:
Dictionary<string, string> regionValues = new Dictionary<string, string>();
using (StringReader sr = new StringReader(SSBS))
{
string strCurrentRegion = "";
string strCurrentRegionContents = "";
while (sr.Peek() != -1)
{
string strCurrentLine = sr.ReadLine();
if (strCurrentLine.Contains("--#region"))
{
strCurrentRegion = strCurrentLine;
}
if (string.IsNullOrEmpty(strCurrentRegion))
{
continue;
}
else if (strCurrentLine.Contains("--#endregion"))
{
regionValues.Add(strCurrentRegion, strCurrentRegionContents);
strCurrentRegion = "";
}
else
{
strCurrentRegionContents += ("\r\n" + strCurrentLine);
}
}
}
However I felt that this could be achieved with a Regex
pattern combined with Regex.Split()
- I can't seem to get the jist of what the pattern should look like...
I have atttempted:
(--#region.*?)\n
(--#region)\w*
I just can't seem to get it! Any help for my desired output appreciated :)
Thanks.
Upvotes: 0
Views: 482
Reputation: 13022
The problem with String.Split
and the Regex
is it loads the whole file into memory.
So, why don't you read the script line by line with a StreamReader
?
Dictionary<string, string> regions = new Dictionary<string, string>();
string regionName = null;
StringBuilder regionString = new StringBuilder();
using (StreamReader streamReader = File.OpenText("MyFile.txt"))
{
while (!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
if (line.StartsWith("--#region ")) // Beginning of the region
{
regionName = line.Substring(10);
}
else if (line.StartsWith("--#endregion")) // End of the region
{
if (regionName == null)
throw new InvalidDataException("#endregion found without a #region.");
regions.Add(regionName, regionString.ToString());
regionString.Clear();
}
else if (regionName != null) // If the line is in a region
{
regionString.AppendLine(line);
}
}
}
Be careful with the Dictionary. If your file contains multiple regions with the same name. It will crash.
Few advices:
StringBuilder
instead of concatenating the string (for better performance).String.StartsWith
instead of String.Contains
for 2 reasons: performance (StartWith
is easier to check, and imagine you have a string containing "--#region"
in your SQL what happen ?!)."\r\n"
which is environment specific, but Environment.NewLine
instead.sr.Peek()
shouldn't be used to test the end of the file/stream. There is a property designed for this: StreamReader.EndOfStream
.Upvotes: 2