Reputation: 241
I know that you saw many questions like mine, but I hope mine is a little bit different. I'm making a translator and I wanted to split a text into sentences but when I've written this code:
public static string[] GetSentences(string Text)
{
if (Text.Contains(". ") || Text.Contains("? ") || Text.Contains("! "))
return Text.Split(new string[] { ". ", "? ", "! " }, StringSplitOptions.RemoveEmptyEntries);
else
return new string[0];
}
It removed the ".", "?", "!". I want to keep them how can I do it.
NOTE: I want to split by ". " dot and a space, "? " question mark and space...
Upvotes: 3
Views: 3859
Reputation: 273314
Simple, replace them first. I'll use the "|"
for readability but you may want to use something more exotic.
// this part could be made a little smarter and more flexible.
// So, just the basic idea:
Text = Text.Replace(". ", ". |").Replace("? ", "? |").Replace("! ", "! |");
if (Text.Contains("|"))
return Text.Split('|', StringSplitOptions.RemoveEmptyEntries);
And I wonder about the else return new string[0];
, that seems odd. Assuming that when there are no delimiters you want the return the input string, you should just remove the if/else
construct.
Upvotes: 16
Reputation: 12807
Regex way:
return Regex.Split(Text, @"(?<=[.?!])\s+");
So you just split the string by empty spaces preceded by one of .
, ?
and !
.
(?<=[.?!])\s+
Upvotes: 2