Reputation: 1
How to webscrape in R to get counts on images and videos for this page? Sorry I'm new to webscrape and would like some help.
Here is the link: https://www.kickstarter.com/projects/urban-farm-florida/hammock-greens-vertical-hydroponic-urban-farm
Should yield video count= 1 and image count=9. But I'm only able to get this far.
library('dplyr')
library('rvest')
library('xml2')
library('selectr')
library("httr")
website<-read_html("https://www.kickstarter.com/projects/urban-farm-florida/hammock-greens-vertical-hydroponic-urban-farm")
website%>%html_nodes("div.template.asset")
Upvotes: 0
Views: 109
Reputation: 581
I thint that the main issue is that the main body of the website, containing all the images, loads only when you open the website. As far as I know, rvest
cannot handle this website architecture. Using RSelenium
is a bit more complicated but works fine for your kind of problem.
library(RSelenium) # open server session and open browser client ---- rD <- rsDriver(browser = c("firefox"), verbose = FALSE) # adapt this to your browser version remDr <- rD$client # navitage to url in browser client ---- url <- "https://www.kickstarter.com/projects/urban-farm-florida/hammock-greens-vertical-hydroponic-urban-farm" remDr$navigate(url) # access image elements using xpath ---- elements_image <- remDr$findElements(using = "xpath", "//*[@class='template asset']") # using only "//img" would show all images on the site = 12 # for some time the xpath "//img[@class='fit lazyloaded']" worked as well, somehow it stopped doing so length(elements_image) # show number of images on site # [1] 9 unlist(lapply(elements_image, function(x) {x$getElementAttribute("alt")})) # print the attribute "alt" to check output # [1] "Vertical Columns hold 18 heads of greens, fed by a closed circuit of nutrient-rich water. " # [2] "" # [3] "" # [4] "our farms can be found in unused, unloved, and unlikely spaces" # [5] "Overtown 1 location at Lotus House homeless shelter" # [6] "led spectrum manipulated to encourage leafy-green growth" # [7] "vertical columns are harvested and planted in the same day to maximize efficiency " # [8] "Chef Aaron shows off some young Romaine" # [9] "Farmer Thomas keeps the whole team moving" # access video elements using xpath ---- elements_video <- remDr$findElements(using = "xpath", "//video") # since there is only one video and since its class seems very specific for the video, "//video" seems the better solution here length(elements_video) # show number of videos on site # [1] 1 unlist(lapply(elements_video, function(x) {x$getElementAttribute("class")})) # print the attribute "class" to check output # [1] "aspect-ratio--object z1 has_hls hide" # close browser client and close server session ---- remDr$close() rD$server$stop()
Upvotes: 1