Reputation: 2415
In: preferences = 'Hello my name is paul. I hate puzzles.'
I want to extract Hello my name is paul.
In: preferences = 'Salutations my name is richard. I love pizza. I hate rain.'
I want to extract Salutations my name is richard. I love pizza.
In: preferences = 'Hi my name is bob. I enjoy ice cream.'
I want to extract Hi my name is bob. I enjoy ice cream.
In other words, I would like to
preferences = '
.
) that has the word hate
in, if present.'
My problem is that my regex stops at the first .
and doesn't extract the subsequent sentences.
Thanks.
Upvotes: 2
Views: 484
Reputation:
One of these might work -
Results in Match[1] buffer
preferences\s*=\s*'([^']*?)(?:(?<=[.'])[^.']*hate[^.']*\.\s*)?'
or
Results in Match[1] buffer
preferences\s*=\s*'([^']*?)(?=(?<=[.'])[^.']*hate[^.']*\.\s*'|')
or
(.Net only)
Results in Match[0] buffer
(?<=preferences\s*=\s*')[^']*?(?=(?<=[.'])[^.']*hate[^.']*\.\s*'|')
edit: Not using \b around 'hate', nor begin/end constructs ^$, feel free to add them if thats what you need. As a side not, its puzzling how apostrophe and period are used in the context of delimiting a string variable that has free form text in it.
Upvotes: 0
Reputation: 192487
I did it with 2 regex. The first is used to strip the preferences = '...'
, and the second is to eliminate any sentence with the word "hate". The 2nd regex uses a positive lookbehind to replace setntences with the keyword with the empty string.
String[] tests = {
"preferences = 'Hello my name is Paul. I hate puzzles.'",
"preferences = 'Salutations my name is Richard. I love pizza. I hate rain.'",
"preferences = 'Hi my name is Bob. Regex turns me on.'"};
var re1 = new Regex("preferences = '(.*)'");
var re2 = new Regex("([^\\.]+(?<=.*\\bhate\\b.*)).\\s*");
for (int i=0; i < tests.Length; i++)
{
Console.WriteLine("{0}: {1}", i, tests[i]);
var m = re1.Match(tests[i]);
if (m.Success)
{
var s = m.Groups[1].ToString();
s = re2.Replace(s,"");
Console.WriteLine(" {1}", i, s);
}
Console.WriteLine();
}
This may not be exactly what you want, since you asked to eliminate only the last sentence if it contains the flag word. But it's easy to adjust if you truly want to strip only the last sentence if it contains the word. In that case you just need to append a $ to the end of re2.
Upvotes: 1
Reputation: 138017
You can achieve what you want using a regular expression:
^preferences\s*=\s*'(.*?\.)(?:[^.]*\bhate\b[^.]*\.)?'$
That one isn't too tricky:
(.*?\.)
- Match your expected output, that will be captured in group $1
. The pattern matches "sentences" (as you've defined), but lazily (*?
), as few as it must.(?:[^.]*\bhate\b[^.]*\.)?
- optionally match the last sentence, but only if it contains "hate". If it can match, and it is the last sentence, the matching engine will not backtrack, and the last sentence will not be included in the captured group.Here's a working example in Rubular: http://www.rubular.com/r/qTuMmB3ySj
(I've added \r\n
in a few places, to avoid [^.]
matching new lines)
Honestly though, you can do better than a single regular expression here, if you can avoid it.
Upvotes: 2
Reputation: 2763
While This is not using RegEx, it will achieve what you are aiming for
List<string> resultsList = new List<string);
for(int i = 0; i < preferences.Count; i++)
{
List<string> tempList = new List<string);
//creating the substring eliminates the "preferences = '" as well as the "'" at end of string
//this line also splits each string from the preferences string list into the tempList array
tempList = preferences[i].Substring(15, preferences[i].Length - 15 - 1).Split('.').ToList();
string buildFinalString = "";
//traverse tempList and only add string to buildFinalString if it does not contain "hate"
foreach(string x in tempList)
{
if(!x.Contains("hate").ToUpper() || !x.Contains("hate").ToLower())
{
buildFinalString = buildFinalString + " " + x;
}
}
resultsList.Add(buildFinalString);
}
or if you only wanted to check the last string in the "tempList" array for the word hate, this would also be possible...
Upvotes: 1