nltk: Text classification using custom feature set

Question

I have a dataset that looks like this:

featureDict = {identifier1: [[first 3-gram], [second 3-gram], ... [last 3-gram]],
               ...
               identifierN: [[first 3-gram], [second 3-gram], ... [last 3-gram]]}

Plus I have a dict of labels for the same set of documents:

labelDict = {identifier1: label1,
             ...
             identifierN: labelN}

I want to figure out the most appropriate nltk container in which I can store this information in one place and seamlessly apply the nltk classifiers.

Additionally, before I use any classifiers on this dataset I'd also like to use a tf-idf filter on this features space.

References and documentation will be helpful.

nltk: Text classification using custom feature set

Answers (1)

Related Questions