Renin RK
Renin RK

Reputation: 111

Unable to extract image links using Rvest

I am unable to extract links of images from a website.

I am new to data scraping. I have used Selectorgadget as well as inspect element method to get the class of the image, but to no avail.

main.page <- read_html(x= "https://www.espncricinfo.com/series/17213/scorecard/64951/england-vs-india-1st-odi-india-tour-of-england-1974")
urls <- main.page %>% 
  html_nodes(".match-detail--item:nth-child(9) .lazyloaded") %>%
  html_attr("src")

sotu <- data.frame(urls = urls)

I am getting the following output:

<0 rows> (or 0-length row.names)

Upvotes: 1

Views: 626

Answers (2)

QHarr
QHarr

Reputation: 84465

As the DOM is modified via javascript (using React) when using browser you don't get the same layout for rvest. You could, less optimal, regex out the info from the javascript object the links are housed in. Then use a json parser to extract the links

library(rvest)
library(jsonlite)
library(stringr)
library(magrittr)

url <- "https://www.espncricinfo.com/series/17213/scorecard/64951/england-vs-india-1st-odi-india-tour-of-england-1974"

r <- read_html(url) %>% 
  html_nodes('body') %>% 
  html_text() %>% 
  toString()

x <- str_match_all(r,'debuts":(.*?\\])')  
json <- jsonlite::fromJSON(x[[1]][,2])
print(json$imgicon)

Upvotes: 0

user10191355
user10191355

Reputation:

Certain classes and parameters don't show up in the scraped data for some reason. Just target img instead of .lazyloaded and data-src instead of src:

library(rvest)

main.page <- read_html("https://www.espncricinfo.com/series/17213/scorecard/64951/england-vs-india-1st-odi-india-tour-of-england-1974")

main.page %>% 
    html_nodes(".match-detail--item:nth-child(9) img") %>%
    html_attr("data-src")

#### OUTPUT ####

 [1] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/1.png&h=25&w=25"
 [2] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
 [3] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
 [4] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
 [5] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
 [6] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
 [7] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
 [8] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
 [9] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[10] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[11] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[12] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"

Upvotes: 2

Related Questions