Jinesh Patel
Jinesh Patel

Reputation: 165

Extracting embedded url via website's HTML code R

So I am trying to read the file named "North America Rotary Rig Count Pivot Table (Feb 2011 - Current)" into R. However, baker hughes changes the URL slightly each week so I cannot just copy and paste the URL to in my code. So my question is, is it possible to access the website's HTML code and find the location of the URL in R? If not, what is the best possible way to access the URL aside from going manually copying and pasting in the URL.

http://phx.corporate-ir.net/phoenix.zhtml?c=79687&p=irol-reportsother

Upvotes: 0

Views: 119

Answers (1)

danielson
danielson

Reputation: 1029

Here is a slight amendment to code I have used to pull all links from a website. This will pull all links from ahref tags and the displayed links. It should suffice, though there could be a more efficient solution for finding a single link.

require(rvest)
webpage = read_html(x='http://phx.corporate-ir.net/phoenix.zhtml?c=79687&p=irol-reportsother')
filelink = 'North America Rotary Rig Count Pivot Table (Feb 2011 - Current)'

urls = webpage %>%
        html_nodes('a') %>%
        html_attr('href')

labels = webpage %>%
        html_nodes('a') %>%
        html_text() %>%
        trimws()

links = data.frame(labels=labels, urls=urls)
links[labels==filelink,]
                                                             labels
287 North America Rotary Rig Count Pivot Table (Feb 2011 - Current)
                                                                                                       urls
287 http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NjU1OTg2fENoaWxkSUQ9MzYyMDEwfFR5cGU9MQ==&t=1

Upvotes: 1

Related Questions