Reputation: 2402
I am wondering is there any solution to download files from a website with VBscript?
I know how to download a single file from a website but how can I make it into a loop? Also how can I search a particular page for a certain file extension and download the file(s) if available?
For each pdf in website
xhr.open "GET", pdf.src, False
xhr.send
set stream = CreateObject("Adodb.Stream")
with stream
.type = 1
.Open
.Write xhr.responsebody
.SaveToFile "C:\temp\" + CStr(index) + ".pdf", 2
end with
stream.Close
set stream = nothing
index = index + 1
Next
Let's say we have a website https://website.com/productpage/
then there are links that all have the same structure https://website.com/products/xx-x-xx-x/
so all needed links start with https://website.com/products/
. There seems to be 33 links of that kind according to source code.
Then after proceeding to some page there are PDF files. Sometimes one, sometimes 3 or 4. However link to the PDF file is something like https://website.com/wp-content/uploads/2016/12/xxxx.pdf
where xxxx.pdf can actually be a filename.
Here is what I have managed to get for one file:
dim xHttp: Set xHttp = createobject("Microsoft.XMLHTTP")
dim bStrm: Set bStrm = createobject("Adodb.Stream")
xHttp.Open "GET", "https://website.com/wp-content/uploads/2016/12/xxxx.pdf", False
xHttp.Send
with bStrm
.type = 1 '//binary
.open
.write xHttp.responseBody
.savetofile "c:\temp\xxxx.pdf", 2 '//overwrite
end with
EDIT:
Should it go like:
Structure of website:
https://website.com/productpage/
https://website.com/products/xx-x/
https://website.com/wp-content/uploads/2016/12/xx-xx.pdf
https://website.com/products/xxxxx-xsx/
https://website.com/wp-content/uploads/2018/12/x-xx-x.pdf
https://website.com/wp-content/uploads/2015/12/x-x-xx.pdf
https://website.com/wp-content/uploads/2019/12/xxx-x.pdf
https://website.com/products/x-xx-xsx/
https://website.com/wp-content/uploads/2014/12/x-xxx.pdf
https://website.com/wp-content/uploads/2013/12/x-x-x-x.pdf
https://website.com/products/xx-x-xsx/
https://website.com/wp-content/uploads/2012/12/x-xxxx.pdf
Upvotes: 2
Views: 162
Reputation: 5031
Since you have the code to save a link, you can wrap it into a sub for re-use:
Sub GetFile(p_sRemoteFile, p_sLocalFile)
Dim xHttp: Set xHttp = CreateObject("Microsoft.XMLHTTP")
Dim bStrm: Set bStrm = CreateObject("Adodb.Stream")
xHttp.open "GET", p_sRemoteFile, False
xHttp.Send
With bStrm
.Type = 1 '//binary
.open
.write xHttp.responseBody
.SaveToFile p_sLocalFile, 2 '//overwrite
End With
End Sub
Then, you can use the InternetExplorer object to get a collection of links in a page:
Sub GetPageLinks(p_sURL)
Dim objIE
Dim objLinks
Dim objLink
Dim iCounter
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
objIE.Navigate p_sURL
Do Until objIE.ReadyState = 4
Wscript.Sleep 100
Loop
Set objLinks = objIE.Document.All.Tags("a")
For iCounter = 1 To objLinks.Length
Set objLink = objLinks(iCounter - 1)
With objLink
If StrComp(Right(.href, 3), "pdf", 1) = 0 Then
' Get file
GetFile .href, "C:\temp\downloads\" & GetFileNameFromURL(.href)
Else
' Process page
GetPageLinks .href
End If
End With
Next
End Sub
Here's a function that extracts the file name from a URL:
Function GetFileNameFromURL(p_sURL)
Dim arrFields
arrFields = Split(p_sURL, "/")
GetFileNameFromURL = arrFields(UBound(arrFields))
End Function
This function will return xxxx.pdf
given https://website.com/wp-content/uploads/2016/12/xxxx.pdf
.
Upvotes: 2