Reputation: 185
I'm trying to count the number of paragraphs in a string in C#.
I'm defining a paragraph as a block of text whose parts may be separated by a new line. Paragraphs must be separated by two or more new line. So:
This is a paragraph. This is a paragraph. This is a paragraph.
My first thought was to separate the string through \n\n
and then count the parts, but this doesn't work properly when there is more than one line-space separating paragraphs, at the beginning and end of files, or if the file only has one line.
How can I accurately get the number of paragraphs in a string, either through a regular expression or through another method?
Upvotes: 1
Views: 2495
Reputation:
Your definition of paragraph can be easily translated to a regex, to get you all the paragraphs:
Regex.Matches(s, "[^\r\n]+((\r|\n|\r\n)[^\r\n]+)*")
[^\r\n]+
means a non-zero number of non-newline characters. \r|\n|\r\n
are the various forms of newline. And basically, for a paragraph, you need these to alternate.
I think this is a better approach than looking for the paragraph separators, because looking for paragraph separators requires too many special cases to give correct results.
To treat blank lines as empty lines, you can change the definition of "line" from "non-zero number of non-newline characters" to "any number of non-newline characters, followed by a non-blank character, followed by any number of non-newline characters". For simplicity, the only character I've counted as blank that cannot be part of a line break is the space character, but you may want to include other characters (e.g. tab) too.
Regex.Matches(s, "[^\r\n]*[^ \r\n]+[^\r\n]*((\r|\n|\r\n)[^\r\n]*[^ \r\n]+[^\r\n]*)*")
Also, this is already over the edge of what I think is sufficiently easy to read, so this could use some restructuring, but I'm not sure of the best way to do that.
Upvotes: 5
Reputation: 117064
If you're happy to avoid regex then this works:
var paragraphs =
text
.Split(
new [] { Environment.NewLine + Environment.NewLine },
StringSplitOptions.RemoveEmptyEntries)
.Count();
Upvotes: 2
Reputation: 1720
You may try the following:
MultiParagraphString.Split(new [] {Environment.NewLine},
StringSplitOptions.RemoveEmptyEntries);
That will return a IEnumerable. If you want to transform them to your structures just use Select:
MultiParagraphString.Split(new [] {Environment.NewLine},
StringSplitOptions.RemoveEmptyEntries)
.Select(s => new ParagraphInfo(s)).ToList();
Coppied from question How to separate paragraphs in a string
Upvotes: -2