Reputation: 743
I'm trying to scrape the links off of this website
library(rvest)
library(tidyverse)
url=read_html('https://web.archive.org/web/*/https://www.bjjcompsystem.com/tournaments/1869/categories*')
get_links <- url %>%
html_nodes('#resultsUrl a') %>%
html_attr('href') %>%
paste0('https://web.archive.org/web/20220000000000*/', .)
get_links
But all I get is character(0)
. I even tried looking for the li class
as has been suggested to me before, but there is nothing useful.
Can someone explain what I'm doing wrong and how to fix it?
Upvotes: 0
Views: 76
Reputation: 6583
Get the links from their source
library(tidyverse)
library(httr2)
library(janitor)
"https://web.archive.org/web/timemap/json?url=https://www.bjjcompsystem.com/tournaments/1869/categories&matchType=prefix&collapse=urlkey&output=json&fl=original,mimetype,timestamp,endtimestamp,groupcount,uniqcount&filter=!statuscode:[45]..&limit=10000&_=1663136483842" %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
as_tibble() %>%
row_to_names(1)
# A tibble: 784 × 6
original mimet…¹ times…² endti…³ group…⁴ uniqc…⁵
<chr> <chr> <chr> <chr> <chr> <chr>
1 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 3 3
2 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 6 6
3 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 2 2
4 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 1 1
5 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 2 2
6 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 1 1
7 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 1 1
8 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 2 2
9 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 1 1
10 https://www.bjjcompsystem.com/tournaments/1869/ca… text/h… 202209… 202209… 1 1
# … with 774 more rows, and abbreviated variable names ¹mimetype, ²timestamp, ³endtimestamp,
# ⁴groupcount, ⁵uniqcount
# ℹ Use `print(n = ...)` to see more rows
Upvotes: 2