Using map() in R to apply a list to function for web scraping

Question

Here's my problem: I have this list I've generated containing a large number of links and I want to take this list and apply a function to it to scrape some data from all those links; however, when I run the program it only takes the data from the first link of that element, reprinting that info for the correct number of iterations. Here's all my code so far:

library(tidyverse)
library(rvest)

source_link<-"http://www.ufcstats.com/statistics/fighters?char=a&page=all"
source_link_html<-read_html(source_link)

#This scrapes all the links for the pages of all the fighters
links_vector<-source_link_html%>%
  html_nodes("div ul li a")%>%
  html_attr("href")%>%
  #This seq selects the 26 needed links, i.e. from a-z
  .[1:26]

#Modifies the pulled data so the links become useable and contain all the fighers instead of just some
links_vector_modded<-str_c("http://www.ufcstats.com", links_vector,"&page=all")

fighter_links<-sapply(links_vector_modded, function(links_vector_modded){
  read_html(links_vector_modded[])%>%
  html_nodes("tr td a")%>%
  html_attr("href")%>%
  .[seq(1,length(.),3)]%>%
  na.omit(fighter_links)
})

###Next Portion: Using the above links to further harvest

#Take all the links within an element of fighter_links and run it through the function career_data to scrape all the statistics from said pages.
fighter_profiles_a<-map(fighter_links$`http://www.ufcstats.com/statistics/fighters?char=a&page=all`, function(career_data){
  #Below is where I believe my problem lies
  read_html()%>%
  html_nodes("div ul li")%>%
  html_text() 
})

The issue I'm having is in the last section of code,read_html(). I do not know how to apply each link in the element within the list to that function. Additionally, is there a way to call all of the elements of fighter_links instead of doing it one element at a time?

Thank you for any advice and assistance!

Ronak Shah · Accepted Answer

You can unlist to get all the fighter_links together and pass it to map function to extract relevant text.

library(rvest)
library(purrr)

fighter_profiles_a<-map(unlist(fighter_links), function(career_data){
  read_html(career_data)%>%
    html_nodes("div ul li")%>%
    html_text() 
})

The text captured at fighter_profiles_a might require some additional cleaning.

Using map() in R to apply a list to function for web scraping

Answers (2)

Related Questions