Reputation: 41
I have spent hours trying to figure this out, and it seems Rebol just can't do it. Here is a program that downloads all the images from a web page. It was great seeing I could write it in much fewer lines of code, yet the performance is terrible. Rebol times out after downloading 4-5 files. Timeouts were reduced by adding wait 5
to the end of the loop but that takes far too long!
An identical program was written in C and it downloaded everything in an instant. Here is part of the code in Rebol which downloads the images:
Upvotes: 1
Views: 111
Reputation: 183
Was the long wait needed? In long loops rebol needs a wait now and then to process gui-events, but IIRC wait 0 should do the trick. Is it possible the event-queuing makes problems?
Upvotes: 1
Reputation: 1503
having used REBOL for commercial apps for years, most of them requiring networking, in multitudes of ways, I can affirm that REBOL's networking is pretty stable. in fact, it can make servers which have months of uptime without any memory leaks.
but since you have a very specific goal in mind, I thought I'd make a little app which shows you how it could be done and work.
This definitely works in R2. One problem you might be having is network port timeouts, but that would only occur if the servers and or images you download require several seconds each and take longer than the 30 second default timeout.
the app below, uses a single url as its parameter (you can set it to whatever you like near the top) and it will download all <IMG> urls its finds on the page. it supports http and https, and I've tested it with a few sites like wikipedia, bing, google image search at it works pretty well... download rates are pretty constant on each server. I added speed reporting on the minimal gui, to give you an idea of download rates.
note that this is a synchronous application, which simply downloads a list of images... you cannot simply add a gui and expect it to run concurrently, since that requires a completely different network model (async http ports), which require more complex networking code.
rebol [
title: "webpage images downloader example"
notes: "works with R2 only"
]
; the last page-url is the one to be used... feel free to change this
page-url: http://en.wikipedia.org/wiki/Dog
page-url: https://www.google.com/search?q=dogs&tbm=isch
page-url: http://www.bing.com/images/search?q=dogs&go=&qs=ds
;------
; automatically setup URL-based information
page-dir: copy/part page-url find/last/tail page-url "/"
page-host: copy/part page-url find/tail at page-url 8 "/"
?? page-url
?? page-dir
?? page-host
output-dir: %downloaded-images/ ; save images in a subdir of current-directory
unless exists? output-dir [make-dir output-dir ]
images: []
;------
; read url (expecting an HTML document)
;
; Parse is used to collect and cleanup URLs, make them absolute URLs.
parse/all read page-url [
some [
thru {<img } thru {src="} copy image to {"} (
case [
"https://" = copy/part image 8 [image: to-url image]
"http://" = copy/part image 7 [image: to-url image]
"//" = copy/part image 2 [image: join http:// at image 3 ]
#"/" = pick image 1 [image: join page-host image ]
'default [image: join page-dir image]
]
append images image
)
]
]
;------
; pretty-print image list
new-line/all images yes
probe images
;------
; display report window
view/new layout [ field-info: text 500 para [wrap?: false] speed-info: text 500 ]
;------
; download images and report all activity
i: bytes: 0
s: now/precise
foreach image images [
unless attempt [
i: i + 1
probe image
legal-chars: charset [#"a" - #"z" #"A" - #"Z" "0123456789-_.="]
fname: to-string find/last/tail image "/" ; get filename from url
parse/all fname [some [ legal-chars | letter: skip (change letter "-") ] ] ; convert illegal filename chars
fname: join output-dir to-file fname ; use url filename to build disk path
write/binary fname read/binary image ; download file
; update GUI
t: difference now/precise s
field-info/text: rejoin ["Downloading: (" i "/" length? images ") " fname]
show field-info
bytes: bytes + size? fname
speed-info/text: rejoin ["bytes: " bytes ", time: " t ", speed : " (bytes / 1000) / ( to-decimal t) "kb/s"]
show speed-info
true ; all is good, attempt should return a value
][
print "^/^/---^/unable to download image:"
print image
print "---^/^/"
]
]
if you do not require the web page scanner and have a manual list of images to grab, just replace that code with a block of images like so:
images: [
http://server.com/img1.png
http://server.com/img2.png
http://server.com/img3.png
]
and let the download loop do its stuff.
Hope this helps
Upvotes: 2
Reputation: 4886
You have a number of errors in your script at http://pastebin.com/fTnq8A3m
For example you have
write ... read/binary ...
so you're reading the image as binary and then writing it out as text. Also you are handling urls as text when a url already exists as a url! datatype.
so in
read/binary join http://www.rebol.com/ %image.jpg
the join there keeps the datatype! intact. There's no need to do this
read/binary to-url join "http://www.rebol.com/" %image.jpg
What size are these images?
Adding wait 5 won't affect the download either as you're attempting a blocking synchronous download, and also since you're using a button, you would be inside VID which would then mean using a wait inside a wait.
Another way to do this would be to setup an async handler, and then start downloading so that you don't block the GUI as you would now.
Upvotes: 2