Reputation: 223
I'm trying to scrape bookies odds from this page :
https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1
so I wrote the following code so far
interwetten <- read_html("https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1")
bundesliga <- html_nodes(interwetten, xpath = '//*[@id="TBL_Content_1019"]')
bundesliga_teams <- html_nodes(bundesliga, "span")
and now the output I get is:
[1] <span id="ctl00_cphMain_UCOffer_LeagueList_rptLeague_ctl00_ucBettingContainer_lblClose" clas ...
[2] <span itemscope="itemscope" itemprop="location" itemtype="http://schema.org/Place"><meta ite ...
[3] <span itemprop="name">VfB Stuttgart</span>
[4] <span>X</span>
Now I want to extract the team name inside every <span itemprop="name"></span>
but I don't know how to extract it. I tried to use nodes or attrs but it didn't work.
Upvotes: 1
Views: 258
Reputation: 43364
You can make the XPath selector more specific and then use html_text
, e.g.
library(rvest)
interwetten <- 'https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1' %>%
read_html()
teams <- interwetten %>%
html_nodes(xpath = '//*[@id="TBL_Content_1019"]//span[@itemprop="name"]') %>%
html_text()
teams
#> [1] "VfB Stuttgart" "1. FC Cologne" "Mainz 05"
#> [4] "Hamburger SV" "Hertha BSC" "Schalke 04"
#> [7] "Hannover 96" "Frankfurt" "Hoffenheim"
#> [10] "Augsburg" "Bayern Munich" "Freiburg"
#> [13] "Dortmund" "RB Leipzig" "Leverkusen"
#> [16] "Wolfsburg" "Werder Bremen" "Monchengladbach"
Upvotes: 1