zcaudate
zcaudate

Reputation: 14258

A regular expression that can split a string having nested brackets that are the same

I know that regular expression can be used to write checkers that check for pairs of start and end symbols for brackets:

eg. a.[b.[c.d]].e yield values a, [b.[c.d]], and e

How can I write a regular expression that can figure out start and end brackets that are the same symbol

eg. a.|b.|c.d||.e would yield values a, |b.|c.d||, and e

update

Thanks for all the comments. I have to give some context to the question. I basically want to mimic javascript syntax

a.hello is a["hello"] or a.hello
a.|hello| is a[hello]
a.|b.c.|d.e||.f.|g| is a[b.c[d.e]].f[g]

So what I'd want to do is to break the symbols into:

 [`a`, `|b.c.|d.e||`, `f`, `|g|`]

and then recur through them if they are pipe-quoted

I've got an implementation of the syntax without pipes here:

https://github.com/zcaudate/purnam

I'm really hoping not to use a parser mainly as I don't know how and I don't think it justifies the necessary complexity. But if regex can't cut it, I may have to.

Upvotes: 4

Views: 491

Answers (1)

zcaudate
zcaudate

Reputation: 14258

Thanks to @m.buettner and @rafal, this is my code in clojure:

There is a normal-mode and pipe-mode. Following what m.buettner described:

Helpers:

(defn conj-if-str [arr s]
  (if (empty? s) arr
      (conj arr s)))

(defmacro case-let [[var bound] & body]
  `(let [~var ~bound]
     (case ~var ~@body)))

Pipe Mode:

(declare split-dotted) ;; normal mode declaration

(defn split-dotted-pipe   ;; pipe mode
  ([output current ss] (split-dotted-pipe output current ss 0))
  ([output current ss level]
      (case-let
       [ch (first ss)]
       nil (throw (Exception. "Cannot have an unpaired pipe"))
       \|  (case level
             0 (trampoline split-dotted
                           (conj output (str current "|"))
                           "" (next ss))
             (recur output (str current "|") (next ss) (dec level)))
       \.  (case-let
            [nch (second ss)]
            nil (throw (Exception. "Incomplete dotted symbol"))
            \|  (recur output (str current ".|") (nnext ss) (inc level))
            (recur output (str current "." nch) (nnext ss) level))
       (recur output (str current ch) (next ss) level))))

Normal Mode:

(defn split-dotted
  ([ss]
     (split-dotted [] "" ss))
  ([output current ss]
     (case-let
      [ch (first ss)]
       nil (conj-if-str output current)
       \.  (case-let
            [nch (second ss)]
            nil (throw (Exception. "Cannot have . at the end of a dotted symbol"))
            \|  (trampoline split-dotted-pipe
                            (conj-if-str output current) "|" (nnext ss))
            (recur (conj-if-str output current) (str nch) (nnext ss)))
       \|  (throw (Exception. "Cannot have | during split mode"))
       (recur output (str current ch) (next ss)))))

Tests:

(fact "split-dotted"
  (js/split-dotted "a") => ["a"]
  (js/split-dotted "a.b") => ["a" "b"]
  (js/split-dotted "a.b.c") => ["a" "b" "c"]
  (js/split-dotted "a.||") => ["a" "||"]
  (js/split-dotted "a.|b|.c") => ["a" "|b|" "c"]
  (js/split-dotted "a.|b|.|c|") => ["a" "|b|" "|c|"]
  (js/split-dotted "a.|b.c|.|d|") => ["a" "|b.c|" "|d|"]
  (js/split-dotted "a.|b.|c||.|d|") => ["a" "|b.|c||" "|d|"]
  (js/split-dotted "a.|b.|c||.|d|") => ["a" "|b.|c||" "|d|"]
  (js/split-dotted "a.|b.|c.d.|e|||.|d|") => ["a" "|b.|c.d.|e|||" "|d|"])

(fact "split-dotted exceptions"
  (js/split-dotted "|a|") => (throws Exception)
  (js/split-dotted "a.") => (throws Exception)
  (js/split-dotted "a.|||") => (throws Exception)
  (js/split-dotted "a.|b.||") => (throws Exception))

Upvotes: 1

Related Questions