Reputation: 91
Does CoreNLP have an API for getting ngrams with position etc.?
For example, I have a string "I have the best car ". if I am using mingrams=1 and maxgrams=2. I should get the following like below.I know stringutil with ngram function but how to get position.
(I,0)
(I have,0)
(have,1)
(have the,1)
(the,2)
(the best,2) etc etc
based on the string I am passing.
Any help is really appreciated.
Thanks
Upvotes: 1
Views: 587
Reputation: 91
just spend some code to rewrite it in scala. its just the above code change it to scala. The out put will be like
NgramInfo(I,0)NgramInfo(I have,0)NgramInfo(have,1)NgramInfo(have the,1)NgramInfo(the,2)NgramInfo(the best,2)NgramInfo(best,3)NgramInfo(best car,3)NgramInfo(car,4)
Below is the method with case class
def getNgramPositions(items: List[String], minSize: Int, maxSize: Int): List[NgramInfo] = {
var ngramList = new ListBuffer[NgramInfo]
for (i <- 0 to items.size by 1) {
for (ngramSize <- minSize until maxSize by 1) {
if (i + ngramSize <= items.size) {
var stringList = new ListBuffer[String]
for (j <- i to i + ngramSize by 1) {
if (j < items.size) {
stringList += items(j)
ngramList += new NgramInfo(stringList.mkString(" "), i)
}
}
}
}
}
ngramList.toList
}
case class NgramInfo(term: String, termPosition: Int) extends Serializable
Thanks
Upvotes: 1
Reputation: 8739
I don't see anything in the utils. Here is some sample code to help:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class NGramPositionExample {
public static List<List<String>> getNGramsPositions(List<String> items, int minSize, int maxSize) {
List<List<String>> ngrams = new ArrayList<List<String>>();
int listSize = items.size();
for (int i = 0; i < listSize; ++i) {
for (int ngramSize = minSize; ngramSize <= maxSize; ++ngramSize) {
if (i + ngramSize <= listSize) {
List<String> ngram = new ArrayList<String>();
for (int j = i; j < i + ngramSize; ++j) {
ngram.add(items.get(j));
}
ngram.add(Integer.toString(i));
ngrams.add(ngram);
}
}
}
return ngrams;
}
public static void main (String[] args) throws IOException {
String testString = "I have the best car";
List<String> tokens = Arrays.asList(testString.split(" "));
List<List<String>> ngramsAndPositions = getNGramsPositions(tokens,1,2);
for (List<String> np : ngramsAndPositions) {
System.out.println(Arrays.toString(np.toArray()));
}
}
}
You can just cut and paste that utility method.
This might be a useful functionality to add, so I will put this on our list of things to work on.
Upvotes: 1