Unstack
Unstack

Reputation: 561

Extract parts of HTML tag using R

Simply put, I'm trying to parse an HTML document which contains, somewhere, the following tag:

<meta property="article:tag" content="myContent"/>

How can I return the 'content' part of this tag using R?

I've been trying to do this with the XML package, but I think I'm heading down a rabbit hole...

Upvotes: 2

Views: 993

Answers (1)

Unstack
Unstack

Reputation: 561

Using the XML package, it looks like I can do something like:

src <- htmlTreeParse('http://mywebsite.com/mypage.html',useInternalNodes=TRUE)
tags <- xpathApply(src, "//meta[@property='article:tag']", xmlAttrs)
print(unlist(tags)[["content"]])

Upvotes: 2

Related Questions