Reputation: 1

Webscrape with R for counting images

How to webscrape in R to get counts on images and videos for this page? Sorry I'm new to webscrape and would like some help.

Here is the link: https://www.kickstarter.com/projects/urban-farm-florida/hammock-greens-vertical-hydroponic-urban-farm

Should yield video count= 1 and image count=9. But I'm only able to get this far.

library('dplyr')
library('rvest')
library('xml2')
library('selectr')
library("httr")
website<-read_html("https://www.kickstarter.com/projects/urban-farm-florida/hammock-greens-vertical-hydroponic-urban-farm")
website%>%html_nodes("div.template.asset")

Upvotes: 0

Answers (1)

ha-pu

Reputation: 581

I thint that the main issue is that the main body of the website, containing all the images, loads only when you open the website. As far as I know, rvest cannot handle this website architecture. Using RSelenium is a bit more complicated but works fine for your kind of problem.

library(RSelenium)

# open server session and open browser client ----
rD <- rsDriver(browser = c("firefox"), verbose = FALSE)
# adapt this to your browser version
remDr <- rD$client

# navitage to url in browser client ----
url <- "https://www.kickstarter.com/projects/urban-farm-florida/hammock-greens-vertical-hydroponic-urban-farm"
remDr$navigate(url)

# access image elements using xpath ----
elements_image <- remDr$findElements(using = "xpath", "//*[@class='template asset']")
# using only "//img" would show all images on the site = 12
# for some time the xpath "//img[@class='fit lazyloaded']" worked as well, somehow it stopped doing so

length(elements_image)
# show number of images on site
# [1] 9

unlist(lapply(elements_image, function(x) {x$getElementAttribute("alt")}))
# print the attribute "alt" to check output
# [1] "Vertical Columns hold 18 heads of greens, fed by a closed circuit of nutrient-rich water. "
# [2] ""                                                                                          
# [3] ""                                                                                          
# [4] "our farms can be found in unused, unloved, and unlikely spaces"                            
# [5] "Overtown 1 location at Lotus House homeless shelter"                                       
# [6] "led spectrum manipulated to encourage leafy-green growth"                                  
# [7] "vertical columns are harvested and planted in the same day to maximize efficiency "        
# [8] "Chef Aaron shows off some young Romaine"                                                   
# [9] "Farmer Thomas keeps the whole team moving"

# access video elements using xpath ----
elements_video <- remDr$findElements(using = "xpath", "//video")
# since there is only one video and since its class seems very specific for the video, "//video" seems the better solution here

length(elements_video)
# show number of videos on site
# [1] 1

unlist(lapply(elements_video, function(x) {x$getElementAttribute("class")}))
# print the attribute "class" to check output
# [1] "aspect-ratio--object z1 has_hls hide"

# close browser client and close server session ----
remDr$close()
rD$server$stop()

Upvotes: 1

Webscrape with R for counting images

Answers (1)

Related Questions