Reputation: 387
I am a beginner in in Python3.6 using BeautifulSoup to perform "web-scraping."
Once I have ran a request.get() and prettyify the output I notice that the webpage does not return the values, it would seem to be storing code which would be related to the value.
Here is the link to the webpage in specific: http://www.tennisabstract.com/cgi-bin/wplayer.cgi?p=AngeliqueKerber&f=r1
I am trying to extract the hand which the player uses in Tennis. Highlighted Yellow from picture below:
Picture of what I am trying to obtain:
I would appreciate feedback concerning the outline of the question, if it is confusing (or non-standard) feedback such as this will help me in the future to ensure I am asking questions appropriately.
Upvotes: 1
Views: 248
Reputation: 2690
Here's a really great GitHub that someone made on this website, an API practically you can change/edit few things (fork it) and then use it the way you want to.
It uses Selenium webdriver but it's high quality.
Upvotes: 0
Reputation: 527
There are two options (mostly).
The first one is easier and slower - browser emulation. You just try to use the site as a normal user - with browser. There is a python module for this task - selenium
. It uses specific webdriver
to use browser. There are plenty of webdrivers available (for example chromedriver
to use chrome
). Also, there are headless solutions (PhantomJS
for example).
The other way is smarter and faster - XMLHttpRequests (XHRs). Basically - site uses some hidden API to get info via JS, and you try to find out how exactly. In most cases you can use Inspect Element
toolbox of your browser. Switch to the network
tab of it, clear it an try to get results. Then sort it to see only XHRs. It usually returns JSON-based values that are easily converted into a python dictionary using json()
method of Response
object.
Upvotes: 1