Reputation: 41951
I am working on a Dutch corpus and I want to know if NLTK has dutch grammar embedded in it so I can parse my sentences? In general does NLTK only work on English? I know that it has the Alpino dutch copora, but there is no indication that the functions (like parsing using CFGs) are made for Dutch also. Thanks
Upvotes: 6
Views: 2760
Reputation: 41951
This is a response to my mail from Steven Bird one of the writers of the NLTK book:
NLTK can work for parsing Dutch if you supply the grammar rules. Please consult the NLTK book for guidance: http://www.nltk.org/book You might be able to use the Alpino corpus in order to develop the grammar (or to train a statistical parser). If your primary interest is obtaining parsed sentences of Dutch, I recommend that you try to find an existing parser rather than developing your own.
In the end I ended up using the Alpino parser which is really strong and written in Prolog, but I managed to port(the binary version) in python.
Upvotes: 2
Reputation: 868
I do not have a straightforward answer, but by combining information coming from the two following pages you should be able to find it out. Here you can find an overview of the high-level parsing interface in NLTK. Parsers require a model, which if present would be listed in the page for the documentation of the data packages that ships with nltk.
As you already know, the Alpino Dutch Treebank is shipped along with NLTK, so in the worst case you should be able to learn a model by yourself (the parser api also provides learning facilities).
Hope it helps somehow.
Upvotes: 0