Reputation: 2040
I'm trying to create a chat bot for our company where we can send messages to this bot which then uses opennlp to parse the string and run some scripts.
A query for example would be
"I'm going to work on ProjectY, can you close ProjectX?"
That should fire the script closeRepo.sh with argument ProjectX.
The problem I have is that it correctly parses the sentence above as 2 parts:
"I'm going to work on ProjectY"
and "can you close ProjectX"
However not all possible projects are correctly parsed. Something I have a projectname where opennlp doesn't see it as an NP but as a ADVB or something else, I think it sees it as the sentence: can you close fast or something like that.
This is my parsing code, I let out the model loading (I use the standard models provided here: http://opennlp.sourceforge.net/models-1.5/)
String sentences[] = sentenceDetector.sentDetect(input);
for(int i = 0; i < sentences.length; i++){
String[] tokens = tokenizer.tokenize(sentences[i]);
StringBuffer sb = new StringBuffer();
for(String t : tokens){
sb.append(t);
sb.append(' ');
}
sb.deleteCharAt(sb.length()-1);//remove last space
sentences[i] = sb.toString();
}
ArrayList<Parse> parses = new ArrayList<Parse>();
for(String s : sentences){
Parse topParses[] = ParserTool.parseLine(s, parser, 1);
if(topParses.length > 0){
parses.add(topParses[0]);
}
}
return parses;
I'd be willing to switch to stanford's nlp if that would make it easier.But my question is:
Is there a way to give opennlp a list of my projects and get them detected as NP or NN?
Upvotes: 0
Views: 638
Reputation: 3953
You would probably be better off using the OpenNLP sentence chunker, it works nicely, and check to see if any noun phrase contains one of your project names. Something like this.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
/**
*
* Extracts noun phrases from a sentence. To create sentences using OpenNLP use
* the SentenceDetector classes.
*/
public class OpenNLPNounPhraseExtractor {
public static void main(String[] args) {
try {
String modelPath = "c:\\temp\\opennlpmodels\\";
TokenizerModel tm = new TokenizerModel(new FileInputStream(new File(modelPath + "en-token.zip")));
TokenizerME wordBreaker = new TokenizerME(tm);
POSModel pm = new POSModel(new FileInputStream(new File(modelPath + "en-pos-maxent.zip")));
POSTaggerME posme = new POSTaggerME(pm);
InputStream modelIn = new FileInputStream(modelPath + "en-chunker.zip");
ChunkerModel chunkerModel = new ChunkerModel(modelIn);
ChunkerME chunkerME = new ChunkerME(chunkerModel);
//this is your sentence
String sentence = "Barack Hussein Obama II is the 44th President of the United States, and the first African American to hold the office.";
//words is the tokenized sentence
String[] words = wordBreaker.tokenize(sentence);
//posTags are the parts of speech of every word in the sentence (The chunker needs this info of course)
String[] posTags = posme.tag(words);
//chunks are the start end "spans" indices to the chunks in the words array
Span[] chunks = chunkerME.chunkAsSpans(words, posTags);
//chunkStrings are the actual chunks
String[] chunkStrings = Span.spansToStrings(chunks, words);
for (int i = 0; i < chunks.length; i++) {
String np = chunkStrings[i];
if (np.contains("some project name")) {
System.out.println(np);
//do something here
}
}
} catch (IOException e) {
}
}
}
BTW, what you are trying to do implies extremely high expectations for a statistical NLP approach. Sentence chunking is based on a model, and if your chats don't fit the general shape of the data the model was created with, your results are going to be problematic, regardless of whether you use opennlp or stanford or anything else. It sounds like you are also trying to extract an "action to take" in relation to the project name NP, you may tinker with Verb phrase extraction. I don't recommend automatically firing off sh scripts based on a probabilistic parsing of potentially noisy sentences!
Upvotes: 1