Reputation: 11
I am currently trying to scrape a link to Google Patents on this page, https://datatool.patentsview.org/#detail/patent/10745438, but when I am trying to print out all of the links with an 'a' tag, only an unrelated website comes up.
Here is my code so far:
url = 'https://datatool.patentsview.org/#detail/patent/10745438'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
links = []
print(soup)
for link in soup.find_all('a', href=True):
print(link['href'])
When I print out the soup, the 'a' tag with the link to the google patents isn't printed, nor is the link in the array. The only thing printed is
http://uspto.gov/
tel:1-800-786-9199
./#viz/relationships
./#viz/locations
./#viz/comparisons
, which is all unnecessary information. Is google protecting their links in some way, or is there any other way I can retrieve the link to the google patent or redirect to the page?
Upvotes: 1
Views: 489
Reputation: 9639
Don't scrape it, just do some link hacking:
url = 'https://datatool.patentsview.org/#detail/patent/10745438'
google_patents_url = 'https://www.google.com/patents/US' + url.rsplit('/', 1)[1]
Upvotes: 1