Bernie Zhu
Bernie Zhu

Reputation: 43

Is there any way to scrape the links to all patents from a Google Patents search?

I want to scrape links to patents from a Google Patents Search using BeautifulSoup, but I'm not sure if Google converts their html into javascript, which cannot be parsed through BeautifulSoup, or what the issue is.

Here is some simple code:

url = 'https://patents.google.com/?assignee=Roche&after=priority:20110602&type=PATENT&num=100'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

links = []
for link in soup.find_all('a', href=True):
    print(link['href'])

I also wanted to append the links into the list, but nothing is printed because there are no 'a' tags from the soup. Is there any way to grab the links to all of the patents?

Upvotes: 1

Views: 1629

Answers (1)

Bhavya Parikh
Bhavya Parikh

Reputation: 3400

Data is dynamically render so its hard to get from bs4 so what you can try go to chrome developer mode.

Then go to Network tab you can now find xhr tab reload your web page so there will be links under Name tab from that one link is containing all data as json format

so you can copy the address of that link and you can use requests module make call and now you can extract what so ever data you want

also if you want individual link so it is made of publication_number and you can join it with old link to get links of publications.

import requests
main_url="https://patents.google.com/"
params="?assignee=Roche&after=priority:20110602&type=PATENT&num=100"

res=requests.get("https://patents.google.com/xhr/query?url=assignee%3DRoche%26after%3Dpriority%3A20110602%26type%3DPATENT%26num%3D100&exp=")
main_data=res.json()
data=main_data['results']['cluster']

for i in range(len(data[0]['result'])): 
    num=data[0]['result'][i]['patent']['publication_number']
    print(num)
    print(main_url+"patent/"+num+"/en"+params)

Output:

US10287352B2
https://patents.google.com/patent/US10287352B2/en?assignee=Roche&after=priority:20110602&type=PATENT&num=100
US10364292B2
https://patents.google.com/patent/US10364292B2/en?assignee=Roche&after=priority:20110602&type=PATENT&num=100
US10494633B2
.....

Image: enter image description here

Upvotes: 2

Related Questions