zcaudate
zcaudate

Reputation: 14258

A regex style matching library for generic matching on list items

I've seen a library like this around before but then forgotten about what it was called.

You can specify a pattern that matches elements in a list, something along the lines of:

(def oddsandevens (pattern (n-of odd? :*) (n-of even? 2) :$))

(pattern-match oddsandevens [1 1 2 2]) => true
(pattern-match oddsandevens [1 1 1 2 2]) => true

(pattern-match oddsandevens [1 1 2 2 2]) => false
(pattern-match oddsandevens [1 1 2]) => false

If I'm totally imagining this, can someone shed light on how one might write one of these things?

Upvotes: 7

Views: 233

Answers (2)

A. Webb
A. Webb

Reputation: 26446

More generally, you are asking for an expressive way to parse a sequence. There are of course many parsing libraries out there for Clojure, but many of them complect the lexing with the parsing (there may be good reasons for this in terms of optimizing performance), and so can only be used on strings. You might have to look outside the toolbox to find a parser that allows lexing as a separate concern.

Take, for example, The Parsatron (weighing only 262 loc)

(require '[the.parsatron ; sampling of available combinators 
            :refer [run token attempt many times choice always never >> eof]])

(defn matches? [parser input] 
  (run 
    (choice 
      (attempt (>> parser (eof) (always true))) 
      (always false))
    input))

Now define your pattern

(def odds-and-evens (>> (many (token odd?)) (times 2 (token even?))))

And test

(matches? odds-and-evens [1 1 2 2])   ;=> true
(matches? odds-and-evens [1 1 1 2 2]) ;=> true
(matches? odds-and-evens [1 1 2 2 2]) ;=> false
(matches? odds-and-evens [1 1 2])     ;=> false

From here you can add sugar to specify your pattern as desired.

Upvotes: 2

Thumbnail
Thumbnail

Reputation: 13483

Too long for a comment:

If you're thinking of growing your own, you might find Brzozowski's article on Derivatives of Regular Expressions useful. It describes an efficient technique for converting a regular expression into a state machine that recognizes it. An alternative source is John Conway's book Regular Algebra and Finite Machines.

I've a feeling that Christopher Grand's GitHub project, found by @leontalbot, has done the work already, but as I don't have a proper development environment to hand, I can't investigate. Although his terminal classes are strings, he defines a protocol for regexes that his unions and sequences (and stars?) comply with.

Looking around other languages.

  • Scala does what Clojure does: wrap the Java regexes.
  • The Boost folk have something for C++. It's not clear whether it requires the terminals to be characters.

Sorry I can't be more helpful.


Added notes:

Upvotes: 2

Related Questions