mzdravkov
mzdravkov

Reputation: 145

Clojure recursive function execution time

I'm programming a web crawler in clojure and it takes constant time independently of the depth I give to the function. This is the function itself. (it is using clojure soup, but that I think is irrelevant)

(defn crawl [source current-depth max-depth]
    (if (= current-depth 0) (write-to-hf "{" false))
    (let [targets (get-links source)]
        (write-to-hf (str ":" source " " (seq targets) "\n") true)
        (if (< current-depth max-depth)
            (map crawl targets (repeat (inc current-depth)) (repeat max-depth))
            (if (= current-depth 0)
                (do (write-to-hf "}" true)
                 targets)
                targets))))

(write-to-hf is function which is writing the data to file, so it is irrelevant to the problem too, I guess.)

When I tested my function in REPL, writing:

(crawl "http://bg.wikipedia.org" 0 1)

It takes about an hour to print all the links, but if I put the result into var it takes less then sec.

(def a (crawl "http://bg.wikipedia.org" 0 1))

This looks normal to me, because I/O operations are the most time expensive, but I tried to test how much time it takes to put the result into var with more layers of depth of the recursion, and it seems like it is constant. Even doing:

((crawl "http://bg.wikipedia.org" 0 100000000000))

takes the same time.

Can someone explain me why this is a constant? I can't imagine how taking links from billions and more pages from wikipedia (which is a huge website with hundreds of links on every page) could be done in less then second.

Upvotes: 2

Views: 227

Answers (1)

Arthur Ulfeldt
Arthur Ulfeldt

Reputation: 91534

This line is producing a lazy sequence of crawled links:

(map crawl targets (repeat (inc current-depth)) (repeat max-depth))

The actual crawling happens when the links are printed (by the REPL in this case) so when you just save them to a var and dont look at them, no work is being done. Doing nothing takes approximately constant time. Wrap that line in a call to doall to make it non-lazy

Upvotes: 6

Related Questions