Custom OpenNLP Name Finder recognizes data in training set, but not in testing set

Question

So I finally got OpenNLP incorporated into my project, and I have successfully trained my model on 15k lines of training data, stored it, and can load it when I want to use it to recognize entities in my program!

I am using it to recognize hashtags, so my training data looks something like this:

    ...
    Jim , I know you to be a fighter  #usmarine  @ USMC Kira has your strength & amp ; ours @ t1r1u1t1h R love 2 U , Kira & amp ; 
    What has changed that people from your JAMAT are insulting Hindu GODS and GODDESSES . Calling our Religion names ... . 
    Ibtihaj represented the United States of America at the Olympics and brought home a medal , elevating the status of 
    A story point is a metric used in agile project management and development to determine ( or estimate ) the difficul 
    I 'm not shy or quiet , I just do n't find your mind appealing in any way shape or form and I 'm not gon na force a conv 
     #paradisepapers  , Canadian Taxpayers Federation ( CTF ) & amp ; tax reform `` CTF has not uttered even a single shocked-and-a 
    ...

I am finding that the model is unable to recognize any hashtags if it is passed a sentence that is not directly in my training set, such as:

String paragraph = "Take a shot for #harambe he took one for you!";

It will be unable to recognize the hashtag in this example, even though I checked and there is one instance of #harambe being used within my training data.

However, if I pass it a sentence directly from the training data:

String nameParagraph = "Idk whats funnier the #harambe or the fact that Im the only one who will see my page https : t.co/2eWjm6mOon ";

It will be able to recognize #harambe by properly identifying it as a HASHTAG.

I want my model to recognize all hashtags, hence I don't just want to feed it more instances of the #harambe hashtag so that it can recognize that SINGLE hashtag.

Any advice for how I can make my model properly identify new entities that are not within the training set? Thanks in advance!

Custom OpenNLP Name Finder recognizes data in training set, but not in testing set

Answers (1)

Related Questions