Reputation: 704
I'm working with maltparser, nltk for process texts. Well i have a integration between maltparser and nltk that works fine. But since every time i execute the program nltk call java VE this take a lot of time... So i think make a webservice who takes conll .txt and return conll parsed by java app.
Well the problem come when i test examples from maltparser sources. I pick one from just initialize model and parser a array of tokens. I just change de model to the regular english one (engmalt.linear-1.7.mco). So execute and return the sentences just like input.
The code is this
public static void main(String[] args) {
// Loading the Swedish model swemalt-mini
ConcurrentMaltParserModel model = null;
try {
URL swemaltMiniModelURL = new File("inputs/engmalt.linear-1.7.mco").toURI().toURL();
System.out.println(swemaltMiniModelURL.getFile());
model = ConcurrentMaltParserService.initializeParserModel(swemaltMiniModelURL);
} catch (Exception e) {
e.printStackTrace();
}
// Creates an array of tokens, which contains the Swedish sentence 'Samtidigt får du högsta sparränta plus en skattefri sparpremie.'
// in the CoNLL data format.
String[] tokens = new String[5];
tokens[0] = "1\tThis\t_\tDT\tDT\t_\t0\ta\t_\t_";
System.out.println(tokens[0]);
tokens[1] = "2\tis\t_\tVBZ\tVBZ\t_\t0\ta\t_\t_";
System.out.println(tokens[1]);
tokens[2] = "3\ta\t_\tZ\tZ\t_\t0\ta\t_\t_";
System.out.println(tokens[2]);
tokens[3] = "4\ttest\t_\tNN\tNN\t_\t0\ta\t_\t_";
System.out.println(tokens[3]);
tokens[4] = "5\t.\t_\tFp\tFp\t_\t0\ta\t_\t_";
System.out.println(tokens[4]);
try {
String[] outputTokens = model.parseTokens(tokens);
ConcurrentUtils.printTokens(outputTokens);
} catch (Exception e) {
e.printStackTrace();
}
}
and the output is:
/home/tomas/workspace/PruebaMalt/inputs/engmalt.linear-1.7.mco
1 This _ DT DT _ 0 a _ _
2 is _ VBZ VBZ _ 0 a _ _
3 a _ Z Z _ 0 a _ _
4 test _ NN NN _ 0 a _ _
5 . _ Fp Fp _ 0 a _ _
1 This _ DT DT _ 0 a _ _
2 is _ VBZ VBZ _ 0 a _ _
3 a _ Z Z _ 0 a _ _
4 test _ NN NN _ 0 a _ _
5 . _ Fp Fp _ 0 a _ _
I try with others models and languages and the same... Any suggestions? ty!
Upvotes: 2
Views: 122
Reputation: 704
I discovered by myself. The problem is that nlkt send to java this format:
1 This _ DT DT _ 0 a _ _
and return: 1 This _ DT DT _ 2 SUBJ _ _
But in java the format is a little different, the last 2 _
has to be removed. With that, it'll work!
input: 1 This _ DT DT _
return: 1 This _ DT DT _ 2 SUBJ _ _
I hope this help others.
Upvotes: 1