Akira
Akira

Reputation: 2870

How to extract information that is not displayed when viewing page source in Google Chrome?

I am trying to scrape the links of Matlab modules' Documentation from https://www.mathworks.com/help/.

I usually view page source in Google Chrome to see the pattern of needed information. In this case, I can not see that information by viewing page source.

enter image description here

As you can see, there is a link corresponding with an item in the left hand box. I would like to extract the name of all items in that box as well as its corresponding link.

Thank you for your help!

Upvotes: 0

Views: 34

Answers (1)

LuckyZakary
LuckyZakary

Reputation: 1191

Found json that they use for that section. Here it is:


import requests

help_json = requests.get('https://www.mathworks.com/help/all_product_doc.json').json()

base_url = 'https://www.mathworks.com/help/'

for content in help_json:
    print(content['displayname'])
    print(base_url+content['helplocation']+'\n')

Upvotes: 1

Related Questions