Python Beautiful Soup (HTML Parsing)

Question

I am a beginner in in Python3.6 using BeautifulSoup to perform "web-scraping."

Once I have ran a request.get() and prettyify the output I notice that the webpage does not return the values, it would seem to be storing code which would be related to the value.

Here is the link to the webpage in specific: http://www.tennisabstract.com/cgi-bin/wplayer.cgi?p=AngeliqueKerber&f=r1

I am trying to extract the hand which the player uses in Tennis. Highlighted Yellow from picture below: Picture of what I am trying to obtain:

I would appreciate feedback concerning the outline of the question, if it is confusing (or non-standard) feedback such as this will help me in the future to ensure I am asking questions appropriately.

Dmitry Arkhipenko · Accepted Answer

There are two options (mostly).

The first one is easier and slower - browser emulation. You just try to use the site as a normal user - with browser. There is a python module for this task - selenium. It uses specific webdriver to use browser. There are plenty of webdrivers available (for example chromedriver to use chrome). Also, there are headless solutions (PhantomJS for example).

The other way is smarter and faster - XMLHttpRequests (XHRs). Basically - site uses some hidden API to get info via JS, and you try to find out how exactly. In most cases you can use Inspect Element toolbox of your browser. Switch to the network tab of it, clear it an try to get results. Then sort it to see only XHRs. It usually returns JSON-based values that are easily converted into a python dictionary using json() method of Response object.

Python Beautiful Soup (HTML Parsing)

Answers (2)

HERE

Related Questions