Reputation: 507
My goal is to extract the urls associated with a specific css elements in a website using rvest
. After looking at several other similar questions, I think I will need to use the html_attr
function with the 'href'
argument. With my present script this only returns NA
values, although I would expect it to return urls.
Input to build variables
library(rvest)
my_url <- "http://www.sherdog.com/events/UFC-Fight-Night-111-Holm-vs-Correia-58241"
my_read_url <- read_html(my_url)
my_nodes <- html_nodes(my_read_url, ".fighter_result_data a span , .right_side a span , .left_side a span")
Input to see if my_nodes
are coming from the names of the athletes.
html_text(my_nodes)
Output showing my_nodes
are selecting the css elements I desire.
[1] "Holly Holm" "Bethe Correia" "Marcin Tybura"
[4] "Andrei Arlovski" "Colby Covington" "Dong Hyun Kim"
[7] "Rafael dos Anjos" "Tarec Saffiedine" "Jon Tuck"
[10] "Takanori Gomi" "Walt Harris" "Cyril Asker"
[13] "Alex Caceres" "Rolando Dy" "Yuta Sasaki"
[16] "Justin Scoggins" "Jingliang Li" "Frank Camacho"
[19] "Russell Doane" "Kwan Ho Kwak" "Naoki Inoue"
[22] "Carls John de Tomas" "Lucie Pudilova" "Ji Yeon Kim"
Input to try to get the urls to each of the athletes' unique pages.
html_attr(my_nodes, "href")
Output showing that my attempt only returns a list of NA
values
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Can anyone help me actually obtain the urls instead of these NA
values? Thank you!
Upvotes: 3
Views: 6246
Reputation: 3173
Similar to @MrFlick answer, the links are present in <a>
you have to access it.
my_url %>%
read_html() %>%
html_nodes('.fighter_result_data') %>% html_nodes('a') %>%
html_attr('href')
[1] "/fighter/Marcin-Tybura-86928" "/fighter/Andrei-Arlovski-270"
Upvotes: 1
Reputation: 206411
You are selecting the span
elements, not the a
elements in your html_nodes
command. The span
elements do not have an href=
attribute, only the a
elements do. Instead use
my_nodes <- html_nodes(my_read_url, ".fighter_result_data a, .right_side a, .left_side a")
html_text(my_nodes)
html_attr(my_nodes, "href")
Upvotes: 9