Reputation: 7407
So I have got a whole string (about 10k chars) and then searching for a word(or many words) in that string. With regex(word).Matches(scrappedstring)
.
But how to do so to extract the whole sentence, that contains that word. I was thinking of taking a substring after the searched word until the first dot/exclamation mark/question mark/etc. But how to take the part of the sentence before the searched word ?
Or maybe there's a better logic ?
Upvotes: 2
Views: 4752
Reputation: 17074
If your boundaries are e.g. .
, !
, ?
and ;
, match all sentences across [^.!?;]*(wordmatch)[^.!?;]*
expression.
It will give all sentences with desired wordmatch inside.
Example:
var s = "First sentence. Second with wordmatch ? Third one; The last wordmatch, EOM!";
var r = new Regex("[^.!?;]*(wordmatch)[^.!?;]*");
var m = r.Matches(s);
var result = Enumerable.Range(0, m.Count).Select(index => m[index].Value).ToList();
Upvotes: 2
Reputation: 3794
You can do that using a process in 2 steps.
1st you fragment the phrases and then filter each one has the word.
something like this:
var input = "A large text with many sentences. Many chars in a string!. A sentence without the pattern word.";
//Step 1: fragment phrase.
var patternPhrase = @"(?<=(^|[.!?]\s*))[^ .!?][^.!?]+[.!?]";
//Step 2: filter out only the phrases containing the word.
var patternWord = @"many";
var result = Regex
.Matches(input, patternPhrase) // step 1
.Cast<Match>()
.Select(s => s.Value)
.Where(w => Regex.IsMatch(w, patternWord, RegexOptions.IgnoreCase)); // step 2
foreach (var item in result)
{
//do something with any phrase.
}
Upvotes: 0
Reputation: 1697
Extract the sentances from the input. Then search for the specified word(s) within each sentance. Return the sentances where the word(s) is present.
public List<string> GetMatchedString(string match, string input)
{
var sentanceList = input.Split(new char[] { '.', '?', '!' });
var regex = new Regex(match);
return sentanceList.Where(sentance => regex.Matches(sentance,0).Count > 0).ToList();
}
Upvotes: 0
Reputation: 38810
Once you have a position, you would then read up to the next .
, or end of the file.. but you also need to read backwards from the beginning of the word to a .
or the beginning of the file. Those two positions mean you can then extract the sentence.
Note, it's not fool-proof... in its simplest form as outlined above e.g.
would mean the sentence started after the g.
which is not probably the case.
Upvotes: 0
Reputation: 101
You can get substrings between sentence finishers (dot/exclamation mark/qustion mark/etc) and search for the word in each sentence inside a loop.
Then return the substring when you find the matching word.
Upvotes: 0