Reputation: 207
I have a string:
In Boston in 1690, Benjamin Harris published Publick Occurrences Both Forreign and Domestick. This is considered the first newspaper in the American colonies even though only one edition was published before the paper was suppressed by the government. In 1704, the governor allowed The Boston News-Letter to be published and it became the first continuously published newspaper in the colonies. Soon after, weekly papers began publishing in New York and Philadelphia. These early newspapers followed the British format and were usually four pages long. They mostly carried news from Britain and content depended on the editor's interests. In 1783, the Pennsylvania Evening Post became the first American daily.
I want to code my program, to extract only one sentence from this text above.
For example, if someone type in TextBox
word `governor', output should show:
In 1704, the governor allowed The Boston News-Letter to be published and it became the first continuously published newspaper in the colonies.
I've tried to do it myself, and I've code so far:
string searchWithinThis = "In Boston in 1690, Benjamin Harris published Publick Occurrences Both Forreign and Domestick. This is considered the first newspaper in the American colonies even though only one edition was published before the paper was suppressed by the government. In 1704, the governor allowed The Boston News-Letter to be published and it became the first continuously published newspaper in the colonies. Soon after, weekly papers began publishing in New York and Philadelphia. These early newspapers followed the British format and were usually four pages long. They mostly carried news from Britain and content depended on the editor's interests. In 1783, the Pennsylvania Evening Post became the first American daily.";
string searchForThis = "governor";
int middle = searchWithinThis.IndexOf(searchForThis);
My idea is, that I could find first '.' before the word 'governor', and last '.' after word 'governor' and then use substring to extract sentence with word `governor'. I don't know how to find IndexOf first and last '.' between word 'governor'.
Upvotes: 1
Views: 106
Reputation: 2684
Aha, regex to the rescue!
[^\.]*\bgovernor\b[^\.]*
Snippet: https://regex101.com/r/mB7fM7/2
Code:
static void Main(string[] args)
{
var textToSearch = "governor";
var textToSearchIn = "In Boston in 1690, Benjamin Harris published Publick Occurrences Both Forreign and Domestick. This is considered the first newspaper in the American colonies even though only one edition was published before the paper was suppressed by the government. In 1704, the governor allowed The Boston News-Letter to be published and it became the first continuously published newspaper in the colonies. Soon after, weekly papers began publishing in New York and Philadelphia. These early newspapers followed the British format and were usually four pages long. They mostly carried news from Britain and content depended on the editor's interests. In 1783, the Pennsylvania Evening Post became the first American daily.";
var pattern = String.Format("[^\\.]*\\b{0}\\b[^\\.]*", textToSearch);
if (Regex.IsMatch(textToSearchIn, pattern))
{
foreach (var matchedItem in Regex.Matches(textToSearchIn, pattern))
{
Console.WriteLine(matchedItem);
Console.WriteLine();
}
}
var lastMatch = Regex.Matches(textToSearchIn, pattern).Cast<Match>().Last();
Console.Read();
}
EDIT: improved the code for word matching using \b
and a Regex.MatchCollection
for multiple matches.
Upvotes: 2
Reputation: 39007
One way could be to split the string into sequences, then find the right one:
var sequence = searchWithinThis.Split('.').FirstOrDefault(s => s.Contains(searchForThis));
It's not as optimized as IndexOf
though, so it could be an issue if you have a very long text.
Otherwise, you could do something like:
var index = searchWithinThis.IndexOf(searchForThis);
if (index != -1)
{
int startIndex = 0;
int endIndex = searchWithinThis.Length;
for (int i = index + searchForThis.Length; i < searchWithinThis.Length; i++)
{
if (searchWithinThis[i] == '.')
{
endIndex = i;
break;
}
}
for (int i = index - 1; i >= 0; i--)
{
if (searchWithinThis[i] == '.')
{
startIndex = i + 1;
break;
}
}
var sequence = searchWithinThis.Substring(startIndex, endIndex - startIndex);
}
Upvotes: 1