Jayden.Cameron
Jayden.Cameron

Reputation: 507

html_attr "href" returns NA in rvest

My goal is to extract the urls associated with a specific css elements in a website using rvest. After looking at several other similar questions, I think I will need to use the html_attr function with the 'href' argument. With my present script this only returns NA values, although I would expect it to return urls.

Input to build variables

library(rvest)

my_url <- "http://www.sherdog.com/events/UFC-Fight-Night-111-Holm-vs-Correia-58241"

my_read_url <- read_html(my_url)

my_nodes <- html_nodes(my_read_url, ".fighter_result_data a span , .right_side a span , .left_side a span")

Input to see if my_nodesare coming from the names of the athletes.

html_text(my_nodes)

Output showing my_nodes are selecting the css elements I desire.

[1] "Holly Holm"          "Bethe Correia"       "Marcin Tybura"      
 [4] "Andrei Arlovski"     "Colby Covington"     "Dong Hyun Kim"      
 [7] "Rafael dos Anjos"    "Tarec Saffiedine"    "Jon Tuck"           
[10] "Takanori Gomi"       "Walt Harris"         "Cyril Asker"        
[13] "Alex Caceres"        "Rolando Dy"          "Yuta Sasaki"        
[16] "Justin Scoggins"     "Jingliang Li"        "Frank Camacho"      
[19] "Russell Doane"       "Kwan Ho Kwak"        "Naoki Inoue"        
[22] "Carls John de Tomas" "Lucie Pudilova"      "Ji Yeon Kim"  

Input to try to get the urls to each of the athletes' unique pages.

html_attr(my_nodes, "href")

Output showing that my attempt only returns a list of NA values

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Can anyone help me actually obtain the urls instead of these NA values? Thank you!

Upvotes: 3

Views: 6246

Answers (2)

Nad Pat
Nad Pat

Reputation: 3173

Similar to @MrFlick answer, the links are present in <a> you have to access it.

my_url %>%
  read_html() %>%
  html_nodes('.fighter_result_data') %>% html_nodes('a') %>% 
  html_attr('href')
[1] "/fighter/Marcin-Tybura-86928"        "/fighter/Andrei-Arlovski-270"   

Upvotes: 1

MrFlick
MrFlick

Reputation: 206411

You are selecting the span elements, not the a elements in your html_nodes command. The span elements do not have an href= attribute, only the a elements do. Instead use

my_nodes <- html_nodes(my_read_url, ".fighter_result_data a, .right_side a, .left_side a")
html_text(my_nodes)
html_attr(my_nodes, "href")

Upvotes: 9

Related Questions