John Cherry
John Cherry

Reputation: 189

Excel VBA script to find 404 errors in a list of URLs?

So, I have this spreadsheet with a list of about 5000 URLs. (All pages on our corporate intranet.)

We know some of the links are broken, but don't know of a good way to determine which without clicking all 5000 links.

Normally this would be a simple matter: Create a web page with links to the 5000 pages, and then check the links with a tool like Xenu Link Sleuth.

But that won't work in this case because many of the links are being redirected, and the redirect code spoofs HTTP.response 200, which tricks Xenu into treating it as a valid URL.

However, there is some good news: The redirect script does not run from within Excel. If you click a bad link inside Excel, the redirect script does not execute and the HTTP response is reported back to Excel. I believe Excel should be able to identify the correct HTTP response code (404) - or at least whether the link was valid or not.

Which brings me to my question:

Is there a way using VBA to write a script that would click through every link and capture the result? The result captured could be in the form of the HTTP response code or anything else you think would be useful in finding the bad links in this list of 5000 pages. Ideally the result would be written to a cell in the spreadsheet adjacent to the link.

If anyone if familiar enough with VBA to suggest a solution to this problem, I would be eternally grateful!

Upvotes: 2

Views: 11388

Answers (2)

Florent B.
Florent B.

Reputation: 42528

Here is an example to check the status line from a list of URL with Excel:

Sub TestLinks()
  Dim source As Range, req As Object, url$
  Set req = CreateObject("Msxml2.ServerXMLHTTP.6.0")

  ' define were the links and results are
  Set source = Range("A1:B2")

  ' clear the results
  source.Columns(2).Clear

  ' iterate each row
  For i = 1 To source.Rows.count
    ' get the link from the first column
    url = source.Cells(i, 1)

    ' send the request using a HEAD to check the status line
    req.Open "HEAD", url, False
    req.setRequestHeader "Accept", "image/webp,image/*,*/*;q=0.8"
    req.setRequestHeader "Accept-Language", "en-GB,en-US;q=0.8,en;q=0.6"
    req.setRequestHeader "Accept-Encoding", "gzip, deflate"
    req.setRequestHeader "Cache-Control", "no-cache"
    req.setRequestHeader "Content-Type", "text/xml; charset=utf-8"
    req.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
    req.Send

    ' write the result in the second column
    source.Cells(i, 2) = req.Status
  Next

  MsgBox "Finished!"
End Sub

Upvotes: 4

Sebastian B
Sebastian B

Reputation: 451

Use a user defined function to return HTML-Status Codes and drag it down next to the links. Might take a while for Excel to check 5000 links, though.

Public Function CheckURL(url As String) As String
Dim request As New WinHttpRequest
request.Open "GET", url
request.Send
CheckURL = request.Status
End Function

You will probably need to add a reference to "Microsoft WinHTTP Services" under "Extras" -> "References"

Upvotes: 3

Related Questions