Unsatisfied Zebra
Unsatisfied Zebra

Reputation: 143

Avoiding stack overflow from recursion in Clojure

I'm new to Clojure and am having trouble figuring out how to avoid stack overflows in certain situations. One such situation came up while trying to port a parsing project to Clojure with a parser combinator library I found called kern.

Kern defines a recursive implementation for a "many-till" parser: source

This works fine for small inputs:

(def input-text "The blue {cat} and the red {dog} became best friends with the white {wolf} END {not included}")

(def  between-brackets "parser that grabs all text between brackets"
  (between (sym* \{) (sym* \}) (<+> (many (none-of* "}")))))

(def parse-enclosed-words "parser that attempts to grab text between brackets, 
skips a character when it can't, and halts when it hits the string END"
  (many-till (<|> between-brackets (skip any-char)) (token* "END")))

(filter #(some? %) (value parse-enclosed-words input-text)) ;; => ("cat" "dog" "wolf")

Unfortunately, the parser encounters stack overflows as input strings grow:

(def file-input-text (slurp (io/resource "[input-text-20x-repeated.txt][2]") ))
(filter #(some? %) (value parse-enclosed-words file-input-text)) ;; => Unhandled java.lang.StackOverflowError

From what I can tell by reading online, this is probably due to the function using stack-consuming recursion. I played around with some attempts to re-write the function using the "recur" keyword, but since the recursive call is not in tail position this didn't seem to work.

How can many-till be modified to avoid stack overflow?

Upvotes: 5

Views: 232

Answers (2)

Alan Thompson
Alan Thompson

Reputation: 29984

There is already a good answer discussing the specifics of recursion with this particular library.

On the more general question of parsing data using Clojure, you should check out the high-quality parser library Instaparse. It is written in Clojure, and is very powerful. You could either use it directly, or for comparison purposes.

See also the video from Clojure/West.

Upvotes: 1

amalloy
amalloy

Reputation: 92117

I don't think you can do it with the abstractions the library provides. This is a very natural definition of many-till, and would work fine in Parsec, the library that Kern is obviously inspired by. But Clojure doesn't have Haskell's lazy evaluation and automatic trampolining, so the nested lambdas that many-till builds inevitably consume unbounded stack. You would need an implementation more like its implementation of many, which builds a parser by hand out of a function. I would include its source below, but I don't own it and so I don't think I am authorized to give Stack Overflow a CC BY-SA 4.0 license to it, as posting it would do. Instead, here is a link to its source.

Upvotes: 4

Related Questions