Reputation: 435
I stucked a day's trying to find a answer: is there a possibility with classic ASP, using MSXML2.ServerXMLHTTP.6.0 - to parse html code and extract a content of a HTML node by gived ID? For example:
remote html file:
<html>
.....
<div id="description">
some important notes here
</div>
.....
</html>
asp code
<%
...
Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
objHTTP.Open "GET", url_of_remote_html, False
objHTTP.Send
...
%>
Now - i read a lot of docs, that there is a possibility to access HTML as source (objHTTP.responseText) and as structure (objHTTP.responseXML). But how in a world i can use that XML response to access content of that div? I read and try so many examples, but can not find anything clear that I can solve that.
Upvotes: 1
Views: 4138
Reputation: 3706
First up, perform the GET request as in your original code snippet:
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
http.Open "GET", url_of_remote_html, False
http.Send
Next, create a regular expression object and set the pattern to match the inner html of an element with the desired id:
Set regEx = New RegExp
regEx.Pattern = "<div id=""description"">(.*?)</div>"
regEx.Global = True
Lastly, pull out the content from the first submatch within the first match:
On Error Resume Next
contents = regEx.Execute(http.responseText)(0).Submatches(0)
On Error Goto 0
If anything goes wrong and for example the matching element isn't found in the document, contents
will be Null
. If all went to plan contents
should hold the data you're looking for.
Upvotes: 2