Reputation: 41

rebol times out when downloading in rapid succession

I have spent hours trying to figure this out, and it seems Rebol just can't do it. Here is a program that downloads all the images from a web page. It was great seeing I could write it in much fewer lines of code, yet the performance is terrible. Rebol times out after downloading 4-5 files. Timeouts were reduced by adding wait 5 to the end of the loop but that takes far too long!

An identical program was written in C and it downloaded everything in an instant. Here is part of the code in Rebol which downloads the images:

http://pastebin.com/fTnq8A3m

Upvotes: 1

Answers (3)

dt2

Reputation: 183

Was the long wait needed? In long loops rebol needs a wait now and then to process gui-events, but IIRC wait 0 should do the trick. Is it possible the event-queuing makes problems?

Upvotes: 1

moliad

Reputation: 1503

having used REBOL for commercial apps for years, most of them requiring networking, in multitudes of ways, I can affirm that REBOL's networking is pretty stable. in fact, it can make servers which have months of uptime without any memory leaks.

but since you have a very specific goal in mind, I thought I'd make a little app which shows you how it could be done and work.

This definitely works in R2. One problem you might be having is network port timeouts, but that would only occur if the servers and or images you download require several seconds each and take longer than the 30 second default timeout.

the app below, uses a single url as its parameter (you can set it to whatever you like near the top) and it will download all <IMG> urls its finds on the page. it supports http and https, and I've tested it with a few sites like wikipedia, bing, google image search at it works pretty well... download rates are pretty constant on each server. I added speed reporting on the minimal gui, to give you an idea of download rates.

note that this is a synchronous application, which simply downloads a list of images... you cannot simply add a gui and expect it to run concurrently, since that requires a completely different network model (async http ports), which require more complex networking code.

rebol [
    title: "webpage images downloader example"
    notes: "works with R2 only"
]

; the last page-url is the one to be used... feel free to change this
page-url: http://en.wikipedia.org/wiki/Dog
page-url: https://www.google.com/search?q=dogs&tbm=isch
page-url: http://www.bing.com/images/search?q=dogs&go=&qs=ds

;------
; automatically setup URL-based information
page-dir: copy/part page-url find/last/tail page-url "/"
page-host: copy/part page-url find/tail at page-url 8 "/"

?? page-url
?? page-dir
?? page-host

output-dir: %downloaded-images/  ; save images in a subdir of current-directory
unless exists? output-dir [make-dir output-dir ]

images: []

;------
; read url (expecting an HTML document)
;
; Parse is used to collect and cleanup URLs, make them absolute URLs. 
parse/all read page-url [
    some [
        thru {<img } thru {src="} copy image to {"} (
            case [
                "https://" = copy/part image 8 [image: to-url image]
                "http://" = copy/part image 7 [image: to-url image]
                "//" = copy/part image 2 [image: join  http:// at image 3  ]
                #"/" = pick image 1 [image: join page-host image ]
                'default [image: join page-dir image]
            ]
            append images image
         )
    ]
]

;------
; pretty-print image list
new-line/all images yes
probe images

;------
; display report window
view/new layout [ field-info: text 500 para [wrap?: false]   speed-info: text 500    ]

;------
; download images and report all activity
i: bytes: 0
s: now/precise
foreach image images [
    unless attempt [
        i: i + 1 
        probe image
        legal-chars: charset [#"a" - #"z" #"A" - #"Z" "0123456789-_.="] 
        fname: to-string find/last/tail image "/" ; get filename from url

        parse/all fname [some [ legal-chars | letter: skip  (change letter "-") ] ] ; convert illegal filename chars

        fname: join output-dir to-file fname ; use url filename to build disk path
        write/binary fname read/binary image ; download file

        ; update GUI
        t: difference now/precise s

        field-info/text: rejoin ["Downloading: (" i "/" length? images ") "  fname]
        show field-info

        bytes: bytes + size? fname
        speed-info/text: rejoin ["bytes: "  bytes ",   time: "  t   ",   speed : " (bytes / 1000) / ( to-decimal t) "kb/s"]
        show speed-info

        true ; all is good, attempt should return a value
    ][
        print "^/^/---^/unable to download image:"
        print image
        print "---^/^/"
    ]
]

if you do not require the web page scanner and have a manual list of images to grab, just replace that code with a block of images like so:

images: [ 
    http://server.com/img1.png
    http://server.com/img2.png
    http://server.com/img3.png
]

and let the download loop do its stuff.

Hope this helps

Upvotes: 2

Graham Chiu

Reputation: 4886

You have a number of errors in your script at http://pastebin.com/fTnq8A3m

For example you have

write ... read/binary ...

so you're reading the image as binary and then writing it out as text. Also you are handling urls as text when a url already exists as a url! datatype.

so in

read/binary join http://www.rebol.com/ %image.jpg

the join there keeps the datatype! intact. There's no need to do this

read/binary to-url join "http://www.rebol.com/" %image.jpg

What size are these images?

Adding wait 5 won't affect the download either as you're attempting a blocking synchronous download, and also since you're using a button, you would be inside VID which would then mean using a wait inside a wait.

Another way to do this would be to setup an async handler, and then start downloading so that you don't block the GUI as you would now.

Upvotes: 2

rebol times out when downloading in rapid succession

Answers (3)

Related Questions