Guess tags of a paragraph programmatically using python

Question

I've trying to read about NLP in general and nltk in specific to use with python. I don't know for sure if what am looking for exists out there, or if I perhaps need to develop it.

I have a program that collect text from different files, the text is extremely random and talks about different things. Each file contains a paragraph or 3 maximum, my program opens the files and store them into a table.

My question is, can i guess tags of what the paragraph is about? if anyone knows of an existing technology or approach, I would really appreciate it.

Thanks,

alexis · Accepted Answer

Your task is called "document classification", and the nltk book has a whole chapter on it. I'd start with that.

It all depends on your criteria for assigning tags. Are you interested in matching your documents against a pre-existing set of tags, or perhaps in topic extraction (select the N most important words or phrases in the text)?

Guess tags of a paragraph programmatically using python

Answers (2)

Related Questions