Zaid Barkat
Zaid Barkat

Reputation: 11

Cannot scrape google patent URL through python and Beautiful Soup

I am currently trying to scrape a link to Google Patents on this page, https://datatool.patentsview.org/#detail/patent/10745438, but when I am trying to print out all of the links with an 'a' tag, only an unrelated website comes up.

Here is my code so far:

url = 'https://datatool.patentsview.org/#detail/patent/10745438'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

links = []
print(soup)
for link in soup.find_all('a', href=True):
    print(link['href'])

When I print out the soup, the 'a' tag with the link to the google patents isn't printed, nor is the link in the array. The only thing printed is

http://uspto.gov/
tel:1-800-786-9199
./#viz/relationships
./#viz/locations
./#viz/comparisons

, which is all unnecessary information. Is google protecting their links in some way, or is there any other way I can retrieve the link to the google patent or redirect to the page?

Upvotes: 1

Views: 489

Answers (1)

RJ Adriaansen
RJ Adriaansen

Reputation: 9639

Don't scrape it, just do some link hacking:

url = 'https://datatool.patentsview.org/#detail/patent/10745438'
google_patents_url = 'https://www.google.com/patents/US' + url.rsplit('/', 1)[1]

Upvotes: 1

Related Questions