Reputation: 360
I'm looking for syntatic examples or common techniques for doing regular expression style transformations on words instead of characters, given a procedural language.
For example, to trace copying, one would want to create a document with similar meaning but with different word choices.
I'd like to be able to concisely define these possible transformations that I can apply to a text stream.
Eg. "fast noun" to "rapid noun", but "go fast." wouldn't get transformed (no noun afterwards.
Or: "Alice will sing song" to "song will be sung by Alice"
I'd expect this to be done in grammatical checkers, such as detecting passive voice.
A C# implementation for this sort of language-processing would be really neat, but I think the bulk of any effort is coming up with the right rules - Keeping the rules clear and understandable seems like a place to begin.
Upvotes: 7
Views: 1072
Reputation: 29962
If you aren't tied to a particular language, Haskell has Aarne Ranta's Grammatical Framework:
http://www.grammaticalframework.org/
which is explicitly designed to generate parsers, etc for natural language processing of this sort.
Upvotes: 2
Reputation: 43975
I am not aware of any syntaxes that exist for English language processing like you discuss. You would need to create your own DSL using one of the toolsets (such as Word Net) out there.
Upvotes: -1
Reputation: 7769
You could try Jason Rennie > WordNet-QueryData-1.47 > WordNet::QueryData
Upvotes: 3
Reputation: 44307
One good place to start researching would be "Word Net" - it's a dictionary of semantics, grouping words together by similar meaning, and also recording the relationships between words in useful ways.
There are a bunch of software projects leveraging the Word Net corpus, one of them may be what you need.
Upvotes: 2
Reputation: 18237
If you want something more robust for natural language parsing/transforming, you could try the C# port of OpenNLP.
Upvotes: 0
Reputation: 7769
A good place to start would be SIL's CARLAStudio for its "Computer Assisted Related Language Adaptation" suite. Alternatively SIL's Adapt It. SIL has a huge range of linguistic analysis software, which is the direction you appear to be going. It's certainly a big jump from regular expressions, which don't care about the meaning, to something that can handle linguistic analysis.
Upvotes: 0