Reputation: 4660
I am looking for sentences such as
Bachelors Degree in early childhood teaching, psychology
early childhood teaching
from
psychology
My code for this procedure loops through the object triple and keeps it if certain POS requirements are met.
private void processTripleObject(List<CoreLabel> objectPhrase )
{
try
{
StringBuilder sb = new StringBuilder();
for(CoreLabel token: objectPhrase)
{
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
TALog.getLogger().debug("pos: "+pos+" word "+token.word());
if(!matchDegreeNameByPos(pos))
{
return;
}
sb.append(token.word());
sb.append(SPACE);
}
IdentifiedToken itoken = new IdentifiedToken(IdentifiedToken.SKILL, sb.toString());
}
catch(Exception e)
{
TALog.getLogger().error(e.getMessage(),e);
}
Since the comma between teaching and psychology is not in the tokens, I don't know how to recognize the divide.
Can anyone advise?
Upvotes: 1
Views: 670
Reputation: 22234
Note that token.get(CoreAnnotations.PartOfSpeechAnnotation.class)
will return the token if no POS tag was found. Tested with CoreNLP 3.7.0 and "tokenize ssplit pos"
annotators. You can then check if pos
is in a String with punctuation points you are interested in. E.g this some code I just tested:
String punctuations = ".,;!?";
for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
// pos could be "NN" but could also be ","
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
if (punctuations.contains(pos)) {
// do something with it
}
}
}
Upvotes: 2