Procedural Throwback
Procedural Throwback

Reputation: 360

A "regex for words" (semantic replacement) - any example syntax and libraries?

I'm looking for syntatic examples or common techniques for doing regular expression style transformations on words instead of characters, given a procedural language.

For example, to trace copying, one would want to create a document with similar meaning but with different word choices.

I'd like to be able to concisely define these possible transformations that I can apply to a text stream.

Eg. "fast noun" to "rapid noun", but "go fast." wouldn't get transformed (no noun afterwards.
Or: "Alice will sing song" to "song will be sung by Alice"

I'd expect this to be done in grammatical checkers, such as detecting passive voice.

A C# implementation for this sort of language-processing would be really neat, but I think the bulk of any effort is coming up with the right rules - Keeping the rules clear and understandable seems like a place to begin.

Upvotes: 7

Views: 1072

Answers (6)

Edward Kmett
Edward Kmett

Reputation: 29962

If you aren't tied to a particular language, Haskell has Aarne Ranta's Grammatical Framework:

http://www.grammaticalframework.org/

which is explicitly designed to generate parsers, etc for natural language processing of this sort.

Upvotes: 2

Myrddin Emrys
Myrddin Emrys

Reputation: 43975

I am not aware of any syntaxes that exist for English language processing like you discuss. You would need to create your own DSL using one of the toolsets (such as Word Net) out there.

Upvotes: -1

Bevan
Bevan

Reputation: 44307

One good place to start researching would be "Word Net" - it's a dictionary of semantics, grouping words together by similar meaning, and also recording the relationships between words in useful ways.

There are a bunch of software projects leveraging the Word Net corpus, one of them may be what you need.

Upvotes: 2

CVertex
CVertex

Reputation: 18237

If you want something more robust for natural language parsing/transforming, you could try the C# port of OpenNLP.

Upvotes: 0

bugmagnet
bugmagnet

Reputation: 7769

A good place to start would be SIL's CARLAStudio for its "Computer Assisted Related Language Adaptation" suite. Alternatively SIL's Adapt It. SIL has a huge range of linguistic analysis software, which is the direction you appear to be going. It's certainly a big jump from regular expressions, which don't care about the meaning, to something that can handle linguistic analysis.

Upvotes: 0

Related Questions