bearaman
bearaman

Reputation: 1091

How to save web page content to a text file

I use an automation script that tests a browser-based application. I'd like to save the visible text of each page I load as a text file. This needs to work for the current open browser window. I've come across some solutions that use InternetExplorer.Application but this won't work for me as it has to be the current open page.

Ideally, I'd like to achieve this using vbscript. Any ideas how to do this?

Upvotes: 0

Views: 6229

Answers (1)

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200213

You can attach to an already running IE instance like this:

Set app = CreateObject("Shell.Application")
For Each window In app.Windows()
  If InStr(1, window.FullName, "iexplore", vbTextCompare) > 0 Then
    Set ie = window
    Exit For
  End If
Next

Then save the document body text like this:

Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.OpenTextFile("output.txt", 2, True)
f.Write ie.document.body.innerText
f.Close

If the page contains non-ASCII characters you may need to create the output file with Unicode encoding:

Set f = fso.OpenTextFile("output.txt", 2, True, -1)

or save it as UTF-8:

Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type     = 2 'text
stream.Position = 0
stream.Charset  = "utf-8"
stream.WriteText ie.document.body.innerText
stream.SaveToFile "output.txt", 2
stream.Close

Edit: Something like this may help getting rid of script code in the document body:

Set re = New RegExp
re.Pattern    = "<script[\s\S]*?</script>"
re.IgnoreCase = True
re.Global     = True

ie.document.body.innerHtml = re.Replace(ie.document.body.innerHtml, "")

WScript.Echo ie.document.body.innerText

Upvotes: 6

Related Questions