Reputation: 3189
I have simple HTML site with table rows that I want to parse. There are infinitive (or close) pages where are those tables, i.e. on page http://example.com/?page=1
there is table and on page http://example.com/?page=2
there is next table.
I already have basic functions like:
(defn next-page [link] ...) ; given http://example.com/?page=2 returns http://example.com/?page=3
(defn parse [link] ...) ; return list of rows from table parsed from HTML
Now I want to write function that takes starting link and create infinite stream of all rows - first from given link and then from next links.
Example:
table on site: http://example.com/?page=2
|--------------------|
| table 2 |
|--------------------|
| row1: value21 |
| row2: value22 |
| row3: value23 |
|--------------------|
(deftest should-parse
(is (=
'(value21 value22 value23)
(parse "http://example.com/?page=2"))))
table on site: http://example.com/?page=3
|--------------------|
| table 3 |
|--------------------|
| row1: value31 |
| row2: value32 |
|--------------------|
This should be true:
(defntest should-return-stream-with-rows
(is (=
'(value21 value22 value23 value31 value32)
(take 5 (row-stream "http://example.com/?page=2")))))
Upvotes: 2
Views: 92
Reputation: 17849
if i understand you right, you may want to use mapcat
+ iterate
:
let's make functions, acting exactly like yours (i guess)
(defn next-page [page-id] (inc page-id))
(defn parse [page-id]
(map #(str "link-" page-id "-" %)
(take (rand-int 10) (range))))
;; this one makes a random number of links:
;; (parse 0) => ("link-0-0" "link-0-1" "link-0-2" "link-0-3")
so you can use model your desired sequence like this:
(defn all-links [starting-page-id]
(mapcat parse
(take-while some? (iterate next-page starting-page-id))))
it iterates over all the next-page
results strating with first page, and then concatenates all the results. Notice that iteration would be stopped as soon as next-page
returns nil
(thanks to take-while
).
in repl:
user> (take 20 (all-links 0))
("link-0-0" "link-0-1" "link-0-2" "link-0-3" "link-0-4"
"link-0-5" "link-1-0" "link-1-1" "link-2-0" "link-2-1"
"link-2-2" "link-3-0" "link-3-1" "link-3-2" "link-3-3"
"link-3-4" "link-3-5" "link-3-6" "link-3-7" "link-4-0")
Upvotes: 3