Reputation: 25
I'm trying to scrape some tennis stats with r from multiple links using rvest and selectorgadget. Page I scrape from is http://www.atpworldtour.com/en/scores/archive/stockholm/429/2017/results and there are 29 links that look like this: "http://www.atpworldtour.com/en/scores/2017/429/MS001/match-stats". All the links look the same but change from MS001-MS029. Using the below code I get the desired result with only the first 9 links. I see the problem but don't know how to correct it. First 9 links have double 00 and the rest have single 0. The 10th link should be MS010. Any help with this much appreciated.
library(xml)
library(rvest)
library(stringr)
round <- 1:29
urls <- paste0("http://www.atpworldtour.com/en/scores/2017/429/MS00", round,
"/match-stats")
aces <- function(url) {
url %>%
read_html() %>%
html_nodes(".percent-on:nth-child(3) .match-stats-number-left span") %>%
html_text() %>%
as.numeric()
}
results <- sapply(urls, aces)
results
$`http://www.atpworldtour.com/en/scores/2017/429/MS001/match-stats`
[1] 9
$`http://www.atpworldtour.com/en/scores/2017/429/MS002/match-stats`
[1] 8
$`http://www.atpworldtour.com/en/scores/2017/429/MS003/match-stats`
[1] 5
$`http://www.atpworldtour.com/en/scores/2017/429/MS004/match-stats`
[1] 4
$`http://www.atpworldtour.com/en/scores/2017/429/MS005/match-stats`
[1] 8
$`http://www.atpworldtour.com/en/scores/2017/429/MS006/match-stats`
[1] 9
$`http://www.atpworldtour.com/en/scores/2017/429/MS007/match-stats`
[1] 2
$`http://www.atpworldtour.com/en/scores/2017/429/MS008/match-stats`
[1] 9
$`http://www.atpworldtour.com/en/scores/2017/429/MS009/match-stats`
[1] 5
$`http://www.atpworldtour.com/en/scores/2017/429/MS0010/match-stats`
numeric(0)
Upvotes: 1
Views: 1022
Reputation: 10875
One can generate leading zeroes in a formatted string via the sprintf()
function.
ids <- 1:29
urlList <- sapply(ids,function(x){
sprintf("%s%03d%s","http://www.atpworldtour.com/en/scores/2017/429/MS",
x,"/match-stats")
})
# print a few items
urlList[c(1,9,10,29)]
...and the output:
> urlList[c(1,9,10,29)]
[1] "http://www.atpworldtour.com/en/scores/2017/429/MS001/match-stats"
[2] "http://www.atpworldtour.com/en/scores/2017/429/MS009/match-stats"
[3] "http://www.atpworldtour.com/en/scores/2017/429/MS010/match-stats"
[4] "http://www.atpworldtour.com/en/scores/2017/429/MS029/match-stats"
>
Upvotes: 1