Tzomas
Tzomas

Reputation: 704

Maltparser doesn't do anything

I'm working with maltparser, nltk for process texts. Well i have a integration between maltparser and nltk that works fine. But since every time i execute the program nltk call java VE this take a lot of time... So i think make a webservice who takes conll .txt and return conll parsed by java app.

Well the problem come when i test examples from maltparser sources. I pick one from just initialize model and parser a array of tokens. I just change de model to the regular english one (engmalt.linear-1.7.mco). So execute and return the sentences just like input.

The code is this

public static void main(String[] args) {
    // Loading the Swedish model swemalt-mini
    ConcurrentMaltParserModel model = null;
    try {
        URL swemaltMiniModelURL = new File("inputs/engmalt.linear-1.7.mco").toURI().toURL();
        System.out.println(swemaltMiniModelURL.getFile());
        model = ConcurrentMaltParserService.initializeParserModel(swemaltMiniModelURL);
    } catch (Exception e) {
        e.printStackTrace();
    }

    // Creates an array of tokens, which contains the Swedish sentence 'Samtidigt får du högsta sparränta plus en skattefri sparpremie.'
    // in the CoNLL data format.
    String[] tokens = new String[5];
    tokens[0] = "1\tThis\t_\tDT\tDT\t_\t0\ta\t_\t_";
    System.out.println(tokens[0]);
    tokens[1] = "2\tis\t_\tVBZ\tVBZ\t_\t0\ta\t_\t_";
    System.out.println(tokens[1]);
    tokens[2] = "3\ta\t_\tZ\tZ\t_\t0\ta\t_\t_";
    System.out.println(tokens[2]);
    tokens[3] = "4\ttest\t_\tNN\tNN\t_\t0\ta\t_\t_";
    System.out.println(tokens[3]);
    tokens[4] = "5\t.\t_\tFp\tFp\t_\t0\ta\t_\t_";
    System.out.println(tokens[4]);
    try {
        String[] outputTokens = model.parseTokens(tokens);
        ConcurrentUtils.printTokens(outputTokens);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

and the output is:

/home/tomas/workspace/PruebaMalt/inputs/engmalt.linear-1.7.mco
1   This    _   DT  DT  _   0   a   _   _
2   is  _   VBZ VBZ _   0   a   _   _
3   a   _   Z   Z   _   0   a   _   _
4   test    _   NN  NN  _   0   a   _   _
5   .   _   Fp  Fp  _   0   a   _   _
1   This    _   DT  DT  _   0   a   _   _
2   is  _   VBZ VBZ _   0   a   _   _
3   a   _   Z   Z   _   0   a   _   _
4   test    _   NN  NN  _   0   a   _   _
5   .   _   Fp  Fp  _   0   a   _   _

I try with others models and languages and the same... Any suggestions? ty!

Upvotes: 2

Views: 122

Answers (1)

Tzomas
Tzomas

Reputation: 704

I discovered by myself. The problem is that nlkt send to java this format:

1 This _ DT DT _ 0 a _ _

and return: 1 This _ DT DT _ 2 SUBJ _ _

But in java the format is a little different, the last 2 _ has to be removed. With that, it'll work!

input: 1 This _ DT DT _

return: 1 This _ DT DT _ 2 SUBJ _ _

I hope this help others.

Upvotes: 1

Related Questions