using NLTK methods such as tokenize on annotated text

Question

Say I have a corpus of annotated text where a sentence looks something like:

txt = 'red foxes scare me.'

is it possible to tokenize this using word_tokenize in such as way that we get:

['red', 'foxes', 'scare', 'me', '.']

We could use an alternative annotation scheme say:

txt = 'red foxes scare\_EMOTION me'

Is it possible to do this with NLTK -- currently I'm parsing out the annotations and then tracking them out of band and it is very cumbersome.

Answers (1)