Chelsea-fc
Chelsea-fc

Reputation: 165

xml format in stanford pos tagger

i have tagged 20 sentences and this is my code:

public class myTag {

public static void main(String[] args) {

    Properties props = new Properties();

    try {
        props.load(new FileReader("D:/tagger/english-bidirectional-distsim.tagger.props"));
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    
    MaxentTagger tagger = new MaxentTagger("D:/tagger/english-bidirectional-distsim.tagger",props);
    
    //==================================================================================================
    try (BufferedReader br = new BufferedReader(new FileReader("C:/Users/chelsea/Desktop/EN/EN.txt")))
    {

        String sCurrentLine;

        while ((sCurrentLine = br.readLine()) != null) {
            
            String tagged = tagger.tagString(sCurrentLine);
            System.out.println(tagged);
        }

    } catch (IOException e) {
        e.printStackTrace();
    }
    
}

}

this is the output:

img

as you can see in sentence node it has a Id attribute and here it's constantly=0 which it should not be.i expect the value=0,1,2,3,4,... i don't understand what is wrong with my code.

Upvotes: 0

Views: 232

Answers (1)

Nikita Astrakhantsev
Nikita Astrakhantsev

Reputation: 4749

Stanford POS tagger (strictly speaking, sentence splitter that is applied before POS annotator) generates ids for sentences per input text. So, you ask tagger to tag sCurrentLine consisting of one sentence, this text is split into sentences - actually, just one, with id = 0; then you ask to tag another text - sCurrentLine from the next iteration - and it again is the only sentence and thereby it is the first sentence with id = 0; and so on.

Thus, if you want correct ids, firstly create the whole text, then pass it to tagger. However, if your input text is already split by sentences, it'll be better to leave things as they are (and generate ids by yourself in the loop, if you need them).

Upvotes: 1

Related Questions