Reputation: 561
Simply put, I'm trying to parse an HTML document which contains, somewhere, the following tag:
<meta property="article:tag" content="myContent"/>
How can I return the 'content' part of this tag using R?
I've been trying to do this with the XML package, but I think I'm heading down a rabbit hole...
Upvotes: 2
Views: 993
Reputation: 561
Using the XML package, it looks like I can do something like:
src <- htmlTreeParse('http://mywebsite.com/mypage.html',useInternalNodes=TRUE)
tags <- xpathApply(src, "//meta[@property='article:tag']", xmlAttrs)
print(unlist(tags)[["content"]])
Upvotes: 2