Austin Trombley
Austin Trombley

Reputation: 47

R for webscraping - pull price and name

I am trying to get the list of prices and game names from steam website in the URL below, but can't figure out how the xpathSApplyshould parse the below with:

http://store.steampowered.com/search/?sort_by=Price&sort_order=ASC&';">Price

Here is my code

require(RCurl)
require(XML)
url <- "http://store.steampowered.com/search/results?sort_by=Name&sort_order=ASC&category1=1"
SOURCE <-  getURL(url,encoding="UTF-8") #Download the page
substring (SOURCE,1,200)
PARSED <- htmlParse(SOURCE) #Format the html code 
##My problem is in this line below 
(xpathSApply(PARSED, "//div[@class='col search_price']"))

Upvotes: 0

Views: 140

Answers (1)

lukeA
lukeA

Reputation: 54237

Try this:

require(RCurl)
require(XML)
url <- "http://store.steampowered.com/search/?sort_by=Metascore&sort_order=DESC&"
SOURCE <-  getURL(url, encoding="UTF-8") #Download the page
PARSED <- htmlParse(SOURCE, asText = TRUE, encoding = "utf-8")
xpaths <- c(price="//a/div[@class='col search_price']", 
            title="//div[@class='col search_name ellipsis']/h4")
res <- sapply(xpaths, function(x) xpathSApply(PARSED, x, xmlValue, trim = TRUE) )
head(res)
#      price    title                        
# [1,] "9,99€"  "Half-Life 2"                
# [2,] "9,99€"  "Half-Life"                  
# [3,] "19,99€" "BioShock™"                  
# [4,] "18,99€" "The Orange Box"             
# [5,] "19,99€" "Portal 2"                   
# [6,] "14,99€" "The Elder Scrolls V: Skyrim"

Upvotes: 3

Related Questions