Pr0no
Pr0no

Reputation: 4109

Negating sentences using POS-tagging

I'm trying to find a way to negate sentences based on POS-tagging. Please consider:

include_once 'class.postagger.php';

function negate($sentence) {  
  $tagger = new PosTagger('includes/lexicon.txt');
  $tags = $tagger->tag($sentence);
  foreach ($tags as $t) {
    $input[] = trim($t['token']) . "/" . trim($t['tag']) .  " ";
  }
  $sentence = implode(" ", $input);
  $postagged = $sentence;

  // Concatenate "not" to every JJ, RB or VB
  // Todo: ignore negative words (not, never, neither)
  $sentence = preg_replace("/(\w+)\/(JJ|MD|RB|VB|VBD|VBN)\b/", "not$1/$2", $sentence);

  // Remove all POS tags
  $sentence = preg_replace("/\/[A-Z$]+/", "", $sentence);

  return "$postagged<br>$sentence";
}

BTW: In this example, I'm using the POS-tagging implementation and lexicon of Ian Barber. An example of this code running would be:

echo negate("I will never go to their place again");
I/NN will/MD never/RB go/VB to/TO their/PRP$ place/NN again/RB 
I notwill notnever notgo to their place notagain

As you can see, (and this issue is also commented in the code), negating words themselves are being negated as wel: never becomes notnever, which obviously shouldn't happen. Since my regex skills aren't all that, is there a way to exclude these words from the regex used?

[edit] Also, I would very much welcome other comments / critiques you might have in this negating implementation, since I'm sure it's (still) quite flawed :-)

Upvotes: 6

Views: 849

Answers (1)

Nate
Nate

Reputation: 1303

Give this a try:

$sentence = preg_replace("/(\s)(?:(?!never|neither|not)(\w*))\/(JJ|MD|RB|VB|VBD|VBN)\b/", "$1not$2", $sentence);

Upvotes: 3

Related Questions