Reputation: 165
i have tagged 20 sentences and this is my code:
public class myTag {
public static void main(String[] args) {
Properties props = new Properties();
try {
props.load(new FileReader("D:/tagger/english-bidirectional-distsim.tagger.props"));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
MaxentTagger tagger = new MaxentTagger("D:/tagger/english-bidirectional-distsim.tagger",props);
//==================================================================================================
try (BufferedReader br = new BufferedReader(new FileReader("C:/Users/chelsea/Desktop/EN/EN.txt")))
{
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null) {
String tagged = tagger.tagString(sCurrentLine);
System.out.println(tagged);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
this is the output:
as you can see in sentence node it has a Id attribute and here it's constantly=0 which it should not be.i expect the value=0,1,2,3,4,... i don't understand what is wrong with my code.
Upvotes: 0
Views: 232
Reputation: 4749
Stanford POS tagger (strictly speaking, sentence splitter that is applied before POS annotator) generates ids for sentences per input text.
So, you ask tagger
to tag sCurrentLine
consisting of one sentence, this text is split into sentences - actually, just one, with id = 0; then you ask to tag another text - sCurrentLine
from the next iteration - and it again is the only sentence and thereby it is the first sentence with id = 0; and so on.
Thus, if you want correct ids, firstly create the whole text, then pass it to tagger
. However, if your input text is already split by sentences, it'll be better to leave things as they are (and generate ids by yourself in the loop, if you need them).
Upvotes: 1