Reputation: 815
I am trying to train a Maltparser Model for Bangla. I have annotated a small Corpus in Conllu Format. But it it gives me null pointer error. So i tried it with some treebank collected from UD website. And it works on those dataset. My questions are
Can i train Maltparser Model without XPOSTAG, i have annotated the UPOSTAG field and XPOSTAG field is just copies of UPOSTAG. Do i need to annotate XPOSTAG? This is the only difference between my treebank and UD treebank
As it is for evaluation purpose can i automatically convert UPOSTAG to XPOSTAG?
ref: http://universaldependencies.org/format.html
For better understanding i am giving example of both my bank and UD bank
My Example Bank(There are mistakes and some empty fields)(Language is Bangla)
1 Ajake _ NOUN NOUN _ 5 iobj _ _
2 rAtera _ NOUN NOUN _ 1 nmod _ _
3 AbahAoYA _ NOUN NOUN _ 5 nsubj _ _
4 kemana _ ADV ADV _ 5 advmod _ _
5 hate _ VERB VERB _ 0 root _ _
6 pAre _ AUX AUX _ 5 aux _ SpaceAfter=No
7 ? _ _ _ _ _ _ _ _
1 Ajake _ NOUN NOUN _ 5 iobj _ _
2 bikAlera _ NOUN NOUN _ 1 nmod _ _
3 paribesha _ NOUN NOUN _ 5 nsubj _ _
4 kemana _ ADV ADV _ 5 advmod _ _
5 hate _ VERB VERB _ 0 root _ _
6 pAre _ AUX AUX _ 5 aux _ SpaceAfter=No
7 ? _ _ _ _ _ _ _ _
UD Bank
1 From _ ADP IN _ 3 case _ _
2 the _ DET DT _ 3 det _ _
3 AP _ PROPN NNP _ 4 nmod _ _
4 comes _ VERB VBZ _ 0 root _ _
5 this _ DET DT _ 6 det _ _
6 story _ NOUN NN _ 4 nsubj _ _
7 : _ PUNCT : _ 4 punct _ _
1 President _ PROPN NNP _ 2 compound _ _
2 Bush _ PROPN NNP _ 5 nsubj _ _
3 on _ ADP IN _ 4 case _ _
4 Tuesday _ PROPN NNP _ 5 nmod _ _
5 nominated _ VERB VBD _ 0 root _ _
6 two _ NUM CD _ 7 nummod _ _
7 individuals _ NOUN NNS _ 5 dobj _ _
8 to _ PART TO _ 9 mark _ _
9 replace _ VERB VB _ 5 advcl _ _
10 retiring _ VERB VBG _ 11 amod _ _
11 jurists _ NOUN NNS _ 9 dobj _ _
12 on _ ADP IN _ 14 case _ _
13 federal _ ADJ JJ _ 14 amod _ _
14 courts _ NOUN NNS _ 11 nmod _ _
15 in _ ADP IN _ 18 case _ _
16 the _ DET DT _ 18 det _ _
17 Washington _ PROPN NNP _ 18 compound _ _
18 area _ NOUN NN _ 14 nmod _ _
19 . _ PUNCT . _ 5 punct _ _
Upvotes: -1
Views: 293
Reputation: 815
Ok i found the solution for first problem. You don't need XPOSTAG, duplicating UPOSTAG will allow training. my problem was that no word or punctuation mark, "?" in the question, can be left blank.it has to be pos tagged and must be made dependent on the root. It solved my issues.
In case of the second question the answer is ambiguous. There is no valid one to one relationship between UPOSTAG and XPOSTAG as it is language dependent. Any table using the Penn Tree Bank tags will work. But will need post-processing for accuracy.
Upvotes: 0