Scraping list in R

Question

I want to scrape a list of elements (name player, cost, buyer, seller, day) from a local HTML file, but I have a problem with the 2 and 3 when i try to scrape buyer and seller (in this case for the 1st transfer 'Computer' and 'Peter') and for the 2nd transfer 'Computer' and 'james')

document.querySelector("#pressReleases > ul > li:nth-child(**2**) > ul > li.text > div > strong:nth-child(2)")

document.querySelector("#pressReleases > ul > li:nth-child(**3**) > ul > li.text > div > strong:nth-child(2)")

How can scrape the li elements making this 2 variable?

I've tried this in R:

dades<- mylocalfile

player<-dades %>% html_nodes("ul.player li.text strong") %>% html_text() %>% trimws()
cost<-dades %>% html_nodes("ul.player li.text span") %>% html_text() %>% trimws()
buyer<-dades %>% html_nodes("#pressReleases > ul > li:nth-child(2) > ul > li.text > div > strong:nth-child(2)") %>% html_text() %>% trimws()
seller<-dades %>% html_nodes("#pressReleases > ul > li:nth-child(2) > ul > li.text > div > strong:nth-child(1)") %>% html_text() %>% trimws()
day<-dades %>% html_nodes("ul.player li.text time") %>% html_text() %>% trimws()

I detected that this 2 #pressReleases > ul > li:nth-child(2) is variable for each li class="post pressRelease"

The html code:



 
   Fitxatges del dia
    09/08/2019
  
  
    
      
        
        
      
      
         Player1
         09/08/2019 - 05:30
         16.245.485 €
         
           D'
         computer
           a 
         peter
        
       
      
      
     
     
        
        2º puja
        matheu:
        15.925.828 €
     
  
  
    
      
        
        
      
      
       Player2
       09/08/2019 - 05:30
       1.111.711 €
       
          D'
         computer
          a 
         james

niko · Accepted Answer

Here is a possible solution to get the buyer/seller:

# Read the local file
URL <- 'D:/Test/Test.html'
wp <- xml2::read_html(URL, encoding = 'utf-8')
# Extract the relevant nodes
node <- rvest::html_nodes(wp, '.from')
# Extract the names
seller <- gsub('.*D\'
\s+(.*?)
\s+a\s?
\s+(.*?)
.*', '\1', rvest::html_text(node))
# [1] "computer" "computer"
buyer <- gsub('.*D\'
\s+(.*?)
\s+a\s?
\s+(.*?)
.*', '\2', rvest::html_text(node))
# [1] "peter" "james"

Scraping list in R

Answers (2)

Related Questions