ekapek
ekapek

Reputation: 193

Regex - find all phrases except the one including specific word

I have a text file with text like:

"Lorem ipsum text. Second lorem ipsum. How are You. It's 
ok. Done. Something else now.

New line. Halo. Text. Are You ok."

I need a regex to find all sentences (between .) except ones with the word "else" within it. I'm trying many regex patterns but nothing works.

Can I do this with regex?

Upvotes: 1

Views: 1355

Answers (5)

Chris
Chris

Reputation: 10455

You can, but it's not pretty, and it's going to be a lot less efficient than just grabbing all sentences and testing them for 'else' afterwards. Unless there's absolutely, positively no way you can exclude the 'else's before or after, I'd urge you to reconsider how you're approaching the problem.

Disclaimer aside, a quick test shows /(?:^|\.\s+)(([^\.](?!else))+)(?=\.)/im works. Assume it's hideously inefficient though.

A quick test script in PHP:

$re = '/(?:^|\.\s+)(([^\.](?!else))+)(?=\.)/im';

$input = "Lorem ipsum text. Second lorem ipsum. How are You. It's ok. Done. Somthing else now.

New line. Halo. Text. Are You ok.";

preg_match_all($re, $input, $m); var_dump($m[1]);

Produces:

array(9) {
  [0]=> string(16) "Lorem ipsum text"
  [1]=> string(18) "Second lorem ipsum"
  [2]=> string(11) "How are You"
  [3]=> string(7) "It's ok"
  [4]=> string(4) "Done"
  [5]=> string(8) "New line"
  [6]=> string(4) "Halo"
  [7]=> string(4) "Text"
  [8]=> string(10) "Are You ok"
}

Upvotes: 1

markusk
markusk

Reputation: 6677

This is easier if you invert your approach: instead of constructing a regexp matching lines that do not contain "else", make one matching lines that do contain "else" (like sgreeve suggested), then select the lines that don't match.

Upvotes: 0

ghostdog74
ghostdog74

Reputation: 343211

sed 's/\(.[^.]*\)\./&\n/g;s/.*else.*//g' textfile

Upvotes: 0

ghostdog74
ghostdog74

Reputation: 343211

if you are on unix, you can use awk.

$ awk -vRS="." '!/else/' file
"Lorem ipsum text
 Second lorem ipsum
 How are You
 It's
ok
 Done


New line
 Halo
 Text
 Are You ok
"

Upvotes: 0

razlebe
razlebe

Reputation: 7144

Yes, you can use a regex to match strings containing "else" very easily. The expression is very simple:

\belse\b

The \b at either end of the expression indicates a "word boundary", which means that the expression will only match the whole word else and will not match when else is part of another word. Note however that word boundaries don't continue on into punctuation characters, which is useful if you're parsing sentences as you are here.

Hence the expression \belse\b will match these sentences:

  • Blah blah else blah
  • else blah blah blah
  • blah blah blah else
  • blah blah blah else. // note the full stop

...but not this one...

  • blah blahelse blah

You don't say which language you're coding in, but here's a quick example in C#, using the System.Text.RegularExpressions.Regex class and written as an NUnit test:

        [Test]
        public void regexTest()
        {
            // This test passes

            String test1 = "This is a sentence which contains the word else";
            String test2 = "This is a sentence which does not";
            String test3 = "Blah blah else blah blah";
            String test4 = "This is a sentence which contains the word else.";

            Regex regex = new Regex("\\belse\\b");
            Assert.True(regex.IsMatch(test1));
            Assert.False(regex.IsMatch(test2));
            Assert.True(regex.IsMatch(test3));
            Assert.True(regex.IsMatch(test4));
        }

Upvotes: 0

Related Questions