Vinoj Mathew
Vinoj Mathew

Reputation: 91

CoreNLP API for N-grams with position

Does CoreNLP have an API for getting ngrams with position etc.?

For example, I have a string "I have the best car ". if I am using mingrams=1 and maxgrams=2. I should get the following like below.I know stringutil with ngram function but how to get position.

(I,0)
(I have,0)
(have,1)
(have the,1)
(the,2)
(the best,2) etc etc

based on the string I am passing.

Any help is really appreciated.

Thanks

Upvotes: 1

Views: 587

Answers (2)

Vinoj Mathew
Vinoj Mathew

Reputation: 91

just spend some code to rewrite it in scala. its just the above code change it to scala. The out put will be like

NgramInfo(I,0)NgramInfo(I have,0)NgramInfo(have,1)NgramInfo(have the,1)NgramInfo(the,2)NgramInfo(the best,2)NgramInfo(best,3)NgramInfo(best car,3)NgramInfo(car,4) 

Below is the method with case class

   def getNgramPositions(items: List[String], minSize: Int, maxSize: Int): List[NgramInfo] = {
        var ngramList = new ListBuffer[NgramInfo]
        for (i <- 0 to items.size by 1) {
          for (ngramSize <- minSize until maxSize by 1) {
            if (i + ngramSize <= items.size) {
              var stringList = new ListBuffer[String]
              for (j <- i to i + ngramSize by 1) {
                if (j < items.size) {
                  stringList += items(j)
                  ngramList += new NgramInfo(stringList.mkString(" "), i)
                }
              }
            }
          }
        }
        ngramList.toList
      }

case class NgramInfo(term: String, termPosition: Int) extends Serializable

Thanks

Upvotes: 1

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

I don't see anything in the utils. Here is some sample code to help:

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*; 
import edu.stanford.nlp.util.*;


public class NGramPositionExample {


    public static List<List<String>> getNGramsPositions(List<String> items, int minSize, int maxSize) {
        List<List<String>> ngrams = new ArrayList<List<String>>();
    int listSize = items.size();
    for (int i = 0; i < listSize; ++i) {
        for (int ngramSize = minSize; ngramSize <= maxSize; ++ngramSize) {
        if (i + ngramSize <= listSize) {
            List<String> ngram = new ArrayList<String>();
            for (int j = i; j < i + ngramSize; ++j) {
            ngram.add(items.get(j));
            }
                    ngram.add(Integer.toString(i));
            ngrams.add(ngram);
        }
        }
    }
    return ngrams;
    }


        public static void main (String[] args) throws IOException {
            String testString = "I have the best car";
            List<String> tokens = Arrays.asList(testString.split(" "));
            List<List<String>> ngramsAndPositions = getNGramsPositions(tokens,1,2);
            for (List<String> np : ngramsAndPositions) {
                System.out.println(Arrays.toString(np.toArray()));
            }
        }
}

You can just cut and paste that utility method.

This might be a useful functionality to add, so I will put this on our list of things to work on.

Upvotes: 1

Related Questions