Extract parts of HTML tag using R

Question

Simply put, I'm trying to parse an HTML document which contains, somewhere, the following tag:

How can I return the 'content' part of this tag using R?

I've been trying to do this with the XML package, but I think I'm heading down a rabbit hole...

Unstack · Accepted Answer

Using the XML package, it looks like I can do something like:

src <- htmlTreeParse('http://mywebsite.com/mypage.html',useInternalNodes=TRUE)
tags <- xpathApply(src, "//meta[@property='article:tag']", xmlAttrs)
print(unlist(tags)[["content"]])

Extract parts of HTML tag using R

Answers (1)

Related Questions