Tanveer Ahmed
Tanveer Ahmed

Reputation: 35

How to check Natural Language Sentence Structure validity using parser in java?

I am working on a project in which there is a part where I will have to input a sentence to check whether it is a valid sentence or not.

For example, if I give the input as "I am working at home", then the output will give me "Valid Sentence" where if I give the input as "I working home am at", it will give me "Invalid Sentence".

I searched some natural language parsing methods like NLP, Stanford Parser, but it would be helpful if someone please guide me through some java examples about the related problems.

I will be grateful in advance for this help. Thank you.

Upvotes: 2

Views: 1515

Answers (1)

Chthonic Project
Chthonic Project

Reputation: 8336

Whether you use parse trees or not, you will need to use a Markov process to check validity. The features can be word sequences, part-of-speech tag sequences, parse tree segments (i.e. production rules and their extensions), etc. For these, you would use a tokenizer, a POS tagger and a natural language parser, respectively.

The validity check will also be a probabilistic score, not an absolute truth. All (or almost all) natural language parsers are statistical. Which means they require training data. These parsers use context-free grammars or mildly context-sensitive grammars such as CCG or TAG, which are among the best computational approximations of natural language grammars.

Essentially, the model will tell you how likely is it for a feature to appear in a valid sentence after a certain sequence of features has already been seen. That is, it will allow you to compute probabilities of the form P("at"|"am working") and P("at"|"home am"). The former should have a higher probability than the latter. You will need to experimentally determine how high a probability should be in order for a sentence to be considered as valid.

As qqlihq commented, these are under the broad definition of language models. For sentence validity, however, you will usually not need to measure perplexity. The conditional probability measures should suffice.

Upvotes: 3

Related Questions