rvest web scraping from html page

Question

I'm trying to scrape bookies odds from this page :

https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1

so I wrote the following code so far

interwetten <- read_html("https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1") 
bundesliga <- html_nodes(interwetten, xpath = '//*[@id="TBL_Content_1019"]')  
bundesliga_teams <- html_nodes(bundesliga, "span")

and now the output I get is:

[1] VfB Stuttgart
[4] X

Now I want to extract the team name inside every but I don't know how to extract it. I tried to use nodes or attrs but it didn't work.

alistaire · Accepted Answer

You can make the XPath selector more specific and then use html_text, e.g.

library(rvest)

interwetten <- 'https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1' %>% 
    read_html() 

teams <- interwetten %>% 
    html_nodes(xpath = '//*[@id="TBL_Content_1019"]//span[@itemprop="name"]') %>% 
    html_text()

teams
#>  [1] "VfB Stuttgart"   "1. FC Cologne"   "Mainz 05"       
#>  [4] "Hamburger SV"    "Hertha BSC"      "Schalke 04"     
#>  [7] "Hannover 96"     "Frankfurt"       "Hoffenheim"     
#> [10] "Augsburg"        "Bayern Munich"   "Freiburg"       
#> [13] "Dortmund"        "RB Leipzig"      "Leverkusen"     
#> [16] "Wolfsburg"       "Werder Bremen"   "Monchengladbach"

rvest web scraping from html page

Answers (1)

Related Questions