Reputation: 193
I have a text file with text like:
"Lorem ipsum text. Second lorem ipsum. How are You. It's
ok. Done. Something else now.
New line. Halo. Text. Are You ok."
I need a regex to find all sentences (between .
) except ones with the word "else" within it. I'm trying many regex patterns but nothing works.
Can I do this with regex?
Upvotes: 1
Views: 1355
Reputation: 10455
You can, but it's not pretty, and it's going to be a lot less efficient than just grabbing all sentences and testing them for 'else' afterwards. Unless there's absolutely, positively no way you can exclude the 'else's before or after, I'd urge you to reconsider how you're approaching the problem.
Disclaimer aside, a quick test shows /(?:^|\.\s+)(([^\.](?!else))+)(?=\.)/im
works. Assume it's hideously inefficient though.
A quick test script in PHP:
$re = '/(?:^|\.\s+)(([^\.](?!else))+)(?=\.)/im';
$input = "Lorem ipsum text. Second lorem ipsum. How are You. It's ok. Done. Somthing else now.
New line. Halo. Text. Are You ok.";
preg_match_all($re, $input, $m); var_dump($m[1]);
Produces:
array(9) {
[0]=> string(16) "Lorem ipsum text"
[1]=> string(18) "Second lorem ipsum"
[2]=> string(11) "How are You"
[3]=> string(7) "It's ok"
[4]=> string(4) "Done"
[5]=> string(8) "New line"
[6]=> string(4) "Halo"
[7]=> string(4) "Text"
[8]=> string(10) "Are You ok"
}
Upvotes: 1
Reputation: 6677
This is easier if you invert your approach: instead of constructing a regexp matching lines that do not contain "else", make one matching lines that do contain "else" (like sgreeve suggested), then select the lines that don't match.
Upvotes: 0
Reputation: 343211
if you are on unix, you can use awk.
$ awk -vRS="." '!/else/' file
"Lorem ipsum text
Second lorem ipsum
How are You
It's
ok
Done
New line
Halo
Text
Are You ok
"
Upvotes: 0
Reputation: 7144
Yes, you can use a regex to match strings containing "else" very easily. The expression is very simple:
\belse\b
The \b
at either end of the expression indicates a "word boundary", which means that the expression will only match the whole word else
and will not match when else
is part of another word. Note however that word boundaries don't continue on into punctuation characters, which is useful if you're parsing sentences as you are here.
Hence the expression \belse\b
will match these sentences:
// note the full stop
...but not this one...
You don't say which language you're coding in, but here's a quick example in C#, using the System.Text.RegularExpressions.Regex class and written as an NUnit test:
[Test]
public void regexTest()
{
// This test passes
String test1 = "This is a sentence which contains the word else";
String test2 = "This is a sentence which does not";
String test3 = "Blah blah else blah blah";
String test4 = "This is a sentence which contains the word else.";
Regex regex = new Regex("\\belse\\b");
Assert.True(regex.IsMatch(test1));
Assert.False(regex.IsMatch(test2));
Assert.True(regex.IsMatch(test3));
Assert.True(regex.IsMatch(test4));
}
Upvotes: 0