Renaud
Renaud

Reputation: 16521

sentence identification/detection: decide whether some text is a sentence or not

Most sentence splitters are able to split a stream of text at the correct position.

I am looking for a model that will decide whether some text is a sentence or not.

Upvotes: 0

Views: 607

Answers (1)

dhg
dhg

Reputation: 52701

Easy solution: Use a parser (for example, the Stanford Parser, which is free and Java, but there are many options) to parse the sentence. If the parser returns a parse tree (ie, if it finds some appropriate structure), then call it a sentence. If it doesn't, then say it's not. This approach requires no extra effort on your part.

The caveat is that by its very nature, a statistical parser may return a "best guess" parse for a sentence that is actual ungrammatical. Thus, it is possible for an ungrammatical sentence to show up as "ok" under this scheme.

If, on the other hand, you want to be very specific about what is or is not proper grammar according to your system, you could write your own context-free grammar (CFG) and then use a CFG-based parser to parse the sentence (you could find one or implement the CKY algorithm or something). This will tell you precisely whether the sentence meets the grammatical specification you provided or not.

Of course this question is touching on the dangerous subject of "what does it mean to be a sentence" which many linguists will fight you over. It also side-steps the issues of grammatical sentences that don't seem to mean anything such as "Colorless green ideas sleep furiously." or a zillion other semantic issues.

Upvotes: 4

Related Questions