hrzafer
hrzafer

Reputation: 1141

Java Parser for Natural Language

I am looking for a parser (or generated parser) in java that is capable of followings:

  1. I will provide sentences that are already part-of-speech tagged. I will use my own tag set.
  2. I don't have any statistical data. So if the parser is statistical, I want to be able to use it without this feature.
  3. Adaptable to other languages easily. Low learning curve

Upvotes: 0

Views: 711

Answers (2)

ealdent
ealdent

Reputation: 3747

The Stanford Parser (which was listed on that other SO question) will do everything you list.

You can provide your own POS tags, but you will need to do some translation to the Penn TreeBank set if they are not already in that format. Parsers are either statistical or they're not. If they're not, you need a set of grammar rules. No parsers are really built this way anymore, except as toys, because they are really Bad™. So, you can rely on the statistical data the Stanford Parser uses (with no additional work from you). This does mean, however, that statistics about your own tags (if they don't map directly to the Penn TreeBank tags) will be ignored. But since you don't have statistics for your tags anyway, that should be expected.

They have parsers trained for several other languages too, but you will need your own tagged data if you want to go to a language they don't have available. There's no getting around that, no matter which parser you use.

If you know Java (and I assume you do), the Stanford Parser is very straightforward and easy to get going. Also their mailing list is a great resource and is fairly active.

Upvotes: 4

Andrew
Andrew

Reputation: 4624

I'm not very clear on what you'd want, but the first thing I thought of was Mallet:

http://mallet.cs.umass.edu/index.php

Upvotes: 1

Related Questions