Reputation: 1141
I am looking for a parser (or generated parser) in java that is capable of followings:
Upvotes: 0
Views: 711
Reputation: 3747
The Stanford Parser (which was listed on that other SO question) will do everything you list.
You can provide your own POS tags, but you will need to do some translation to the Penn TreeBank set if they are not already in that format. Parsers are either statistical or they're not. If they're not, you need a set of grammar rules. No parsers are really built this way anymore, except as toys, because they are really Bad™. So, you can rely on the statistical data the Stanford Parser uses (with no additional work from you). This does mean, however, that statistics about your own tags (if they don't map directly to the Penn TreeBank tags) will be ignored. But since you don't have statistics for your tags anyway, that should be expected.
They have parsers trained for several other languages too, but you will need your own tagged data if you want to go to a language they don't have available. There's no getting around that, no matter which parser you use.
If you know Java (and I assume you do), the Stanford Parser is very straightforward and easy to get going. Also their mailing list is a great resource and is fairly active.
Upvotes: 4
Reputation: 4624
I'm not very clear on what you'd want, but the first thing I thought of was Mallet:
http://mallet.cs.umass.edu/index.php
Upvotes: 1