Reputation: 165
So I am trying to read the file named "North America Rotary Rig Count Pivot Table (Feb 2011 - Current)" into R. However, baker hughes changes the URL slightly each week so I cannot just copy and paste the URL to in my code. So my question is, is it possible to access the website's HTML code and find the location of the URL in R? If not, what is the best possible way to access the URL aside from going manually copying and pasting in the URL.
http://phx.corporate-ir.net/phoenix.zhtml?c=79687&p=irol-reportsother
Upvotes: 0
Views: 119
Reputation: 1029
Here is a slight amendment to code I have used to pull all links from a website. This will pull all links from ahref tags and the displayed links. It should suffice, though there could be a more efficient solution for finding a single link.
require(rvest)
webpage = read_html(x='http://phx.corporate-ir.net/phoenix.zhtml?c=79687&p=irol-reportsother')
filelink = 'North America Rotary Rig Count Pivot Table (Feb 2011 - Current)'
urls = webpage %>%
html_nodes('a') %>%
html_attr('href')
labels = webpage %>%
html_nodes('a') %>%
html_text() %>%
trimws()
links = data.frame(labels=labels, urls=urls)
links[labels==filelink,]
labels
287 North America Rotary Rig Count Pivot Table (Feb 2011 - Current)
urls
287 http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NjU1OTg2fENoaWxkSUQ9MzYyMDEwfFR5cGU9MQ==&t=1
Upvotes: 1