Reputation: 2870
I am trying to scrape the links of Matlab modules' Documentation from https://www.mathworks.com/help/.
I usually view page source in Google Chrome to see the pattern of needed information. In this case, I can not see that information by viewing page source.
As you can see, there is a link corresponding with an item in the left hand box. I would like to extract the name of all items in that box as well as its corresponding link.
Thank you for your help!
Upvotes: 0
Views: 34
Reputation: 1191
Found json that they use for that section. Here it is:
import requests
help_json = requests.get('https://www.mathworks.com/help/all_product_doc.json').json()
base_url = 'https://www.mathworks.com/help/'
for content in help_json:
print(content['displayname'])
print(base_url+content['helplocation']+'\n')
Upvotes: 1