Java Parser for Natural Language

Question

I am looking for a parser (or generated parser) in java that is capable of followings:

I will provide sentences that are already part-of-speech tagged. I will use my own tag set.
I don't have any statistical data. So if the parser is statistical, I want to be able to use it without this feature.
Adaptable to other languages easily. Low learning curve

ealdent · Accepted Answer

The Stanford Parser (which was listed on that other SO question) will do everything you list.

You can provide your own POS tags, but you will need to do some translation to the Penn TreeBank set if they are not already in that format. Parsers are either statistical or they're not. If they're not, you need a set of grammar rules. No parsers are really built this way anymore, except as toys, because they are really Bad™. So, you can rely on the statistical data the Stanford Parser uses (with no additional work from you). This does mean, however, that statistics about your own tags (if they don't map directly to the Penn TreeBank tags) will be ignored. But since you don't have statistics for your tags anyway, that should be expected.

They have parsers trained for several other languages too, but you will need your own tagged data if you want to go to a language they don't have available. There's no getting around that, no matter which parser you use.

If you know Java (and I assume you do), the Stanford Parser is very straightforward and easy to get going. Also their mailing list is a great resource and is fairly active.

Java Parser for Natural Language

Answers (2)

Related Questions