Inherited Geek
Inherited Geek

Reputation: 2393

NLP & ML Phrase Extraction

What ML Algorithms can I use to train Action phrases in a given Sentence.

Sentence1:I want to play cricket 
Label1: play cricket

Sentence2: Need to wash my clothes
Label2: wash clothes

I have a data of some ~2k Sentences & corresponding Action phrases (Labels) and need to predict another bunch of sentences based on them. Can someone guide me on how to do this using NLP/ML? Which Algo's to use for the same? (preferably python)

Upvotes: 0

Views: 998

Answers (2)

Arjun
Arjun

Reputation: 325

Here's the process of sentence classification:

1) Normalize the text - bring all text to lower case

2) Remove all stop words - ensures that only relevant features are left

3) Tokenize the sentences to unigram tokens

4) Apply stemming technique - try out different stemming models/ lemmatizer to bring the words to their base word. See which one works best for your case. For example: play, played, plays will be converted to base word "play". This step reduces the number of features.

5) Create a Term Document Matrix for all the sentences. Each row of the TDM corresponds to a sentence and each column of the TDM corresponds to a token of the sentence. (There's another way of representing text in the form of matrix called Tf-Idf)

6) Now this term document matrix contains tokens as columns. You already have the labels in place. You can start training the ML models now. I'm assuming you know how to do this part.

Upvotes: 1

avip
avip

Reputation: 1465

Take a look at NLTK's Naive Bayes Classifier, it's multiclass and you can feed it the sentence/label pairs directly.

NaiveBayesClassifier.train() will want training features, I would start with the features simply being the words in each sentence. You can modify the feature selection with more complex methods until you get the results you want.

You can use nltk.classify.util.accuracy to evaluate results. Remember to split your sentences into training and test data.

Upvotes: 0

Related Questions