MAGx2
MAGx2

Reputation: 3189

Create infinite stream from smaller streams

I have simple HTML site with table rows that I want to parse. There are infinitive (or close) pages where are those tables, i.e. on page http://example.com/?page=1 there is table and on page http://example.com/?page=2 there is next table.

I already have basic functions like:

(defn next-page [link] ...) ; given http://example.com/?page=2 returns http://example.com/?page=3
(defn parse [link] ...) ; return list of rows from table parsed from HTML

Now I want to write function that takes starting link and create infinite stream of all rows - first from given link and then from next links.

Example:

table on site: http://example.com/?page=2
|--------------------|
|   table 2          |
|--------------------|
| row1: value21      |
| row2: value22      |
| row3: value23      |
|--------------------|
(deftest should-parse
    (is (=
        '(value21 value22 value23)
        (parse "http://example.com/?page=2"))))

table on site: http://example.com/?page=3
|--------------------|
|   table 3          |
|--------------------|
| row1: value31      |
| row2: value32      |
|--------------------|

This should be true:

(defntest should-return-stream-with-rows
    (is (=
        '(value21 value22 value23 value31 value32)
        (take 5 (row-stream "http://example.com/?page=2")))))

Upvotes: 2

Views: 92

Answers (1)

leetwinski
leetwinski

Reputation: 17849

if i understand you right, you may want to use mapcat + iterate:

let's make functions, acting exactly like yours (i guess)

(defn next-page [page-id] (inc page-id))

(defn parse [page-id]
  (map #(str "link-" page-id "-" %)
       (take (rand-int 10) (range))))

;; this one makes a random number of links:
;; (parse 0) => ("link-0-0" "link-0-1" "link-0-2" "link-0-3")

so you can use model your desired sequence like this:

(defn all-links [starting-page-id]
  (mapcat parse
          (take-while some? (iterate next-page starting-page-id))))

it iterates over all the next-page results strating with first page, and then concatenates all the results. Notice that iteration would be stopped as soon as next-page returns nil (thanks to take-while).

in repl:

user> (take 20 (all-links 0))
("link-0-0" "link-0-1" "link-0-2" "link-0-3" "link-0-4" 
 "link-0-5" "link-1-0" "link-1-1" "link-2-0" "link-2-1" 
 "link-2-2" "link-3-0" "link-3-1" "link-3-2" "link-3-3" 
 "link-3-4" "link-3-5" "link-3-6" "link-3-7" "link-4-0")

Upvotes: 3

Related Questions