Sotoy
Sotoy

Reputation: 13

How to replace a token (CoreLabel) in a sentence (CoreMap) using Stanford NLP?

As usual, I traverse the sentences of an annotated document in a for-loop (Java):

for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
    ...
}

Then, inside that, I remove a word from the sentence (e.g. "teacher") using the word index, I set the new word text to be "John" using the CoreLabel method setWord() and in the end I add the renewed word in the sentence in the same index:

sentence.get(CoreAnnotations.TokensAnnotations.class).remove(token.get(CoreAnnotations.IndexAnnotation.class));
token.setWord("John");
sentence.get(CoreAnnotations.TokensAnnotation.class).add(token.get(CoreAnnotations.IndexAnnotation.class),token);

The problem is that the sentence stays as is. Even if I print the sentence text right after the removal, it won't change. Am I doing something wrong? Is there a more reasonable way?

Upvotes: 0

Views: 269

Answers (1)

Gabor Angeli
Gabor Angeli

Reputation: 5759

I'm going to venture that even though you've changed the word, you haven't changed the originalText. In general, you should be a bit wary of these sorts of transformations -- they can have all sorts of bizarre effects (e.g., your character offsets will be broken), but if you're feeling brave and want to fix the bug at hand, you should be able to fix it by setting:

token.setOriginalText("John");

Upvotes: 1

Related Questions