Reputation: 173
How to use WEKA to find keyphrases with supervised méthod.
i have to learn model for keyphrase extraction, so i have a corpus for training (for every document a correspending file that contain keyphrases or keywords)
Also i have a corpus for test the supervised model (docuement without keyphrases file), so the model should output a list of keyphrases for every document.
My question is how to input the document into weka, should i add for every document
@attribute doc string
@data "Docu1............" "Docu2............" ... .. "DocuN............"
Now how to input the files that contain th keyphrases for every document to learn from the model?
Upvotes: 0
Views: 132
Reputation: 310
First you need choose what features want to use: the most basic algorithm only based on the tf-idf values. https://code.google.com/p/kea-algorithm/ But you can extends this features your "task-specific" feautres too. For example the first occurance of the phrase etc. You can find some possible features in this article: http://www.aclweb.org/anthology/S/S10/S10-1040.pdf Than, you have to choose a machine learning algorithm and train it you train data set, and evaluate it on your test set.
Upvotes: 1