Kim Stacks
Kim Stacks

Reputation: 10812

Extract the inner text of the anchor elements correctly using R

I am using R to scrape the link titles in this link www.jamesaltucher.com/sitemap.xml

This is my code.

library(XML)
library(RCurl)
url.link <- 'http://www.jamesaltucher.com/sitemap.xml'
blog <- getURL(url.link)
blog          <- htmlParse(blog, encoding = "UTF-8")
titles  <- xpathSApply (blog ,"//a",xmlValue)             ## titles

My titles is an empty list.

See the screenshot:

enter image description here

Did I use the xpath incorrectly?

Upvotes: 0

Views: 535

Answers (2)

Gaurav Dani
Gaurav Dani

Reputation: 1

web_page <- readLines("http://vueloeyewear.com/shop/retro/black-cia/")

author_lines <- web_page[grep("strong", web_page)]

author_lines <- author_lines [7:15]

test <- gsub(", ","",toString(author_lines))

test <- gsub("
","

",test)

author_lines <- htmlParse(test)

xpathSApply (author_lines,"//p",xmlValue)

Look at this one, //Loc means the actual tag ..

Upvotes: 0

CHP
CHP

Reputation: 17189

Yes. You are looking for loc element and not a element.

titles  <- xpathSApply (html ,"//loc",xmlValue)

Upvotes: 1

Related Questions