Nico Sánchez
Nico Sánchez

Reputation: 95

Webscraping table with multiple pages using BeautifulSoup

I'm trying to scrape this webpage https://www.whoscored.com/Statistics using BeautifulSoup in order to obtain all the information of the player statistics table. I'm having lot of difficulties and was wondering if anyone would be able to help me.

url = 'https://www.whoscored.com/Statistics'
html = requests.get(url).content
soup = BeautifulSoup(html, "lxml")
text = [element.text for element in soup.find_all('div' {'id':"statistics-table-summary"})]

My problem lies in the fact that I don't know what the correct tag is to obtain that table. Also the table has several pages and I would like to scrape every single one. The only indication I've seen of a change of page in the table is the number in the code below:

<div id="statistics-table-summary" class="" data-fwsc="11">

Upvotes: 1

Views: 194

Answers (1)

C. Gluck
C. Gluck

Reputation: 59

It looks to me like that site loads their data in using Javascript. In order to grab the data, you'll have to mimic how a browser loads a page; the requests library isn't enough. I'd recommend taking a look at a tool like Selenium, which uses a "robotic browser" to load the page. After the page is loaded, you can then use BeautifulSoup to retrieve the data you need.

Here's a link to a helpful tutorial from RealPython.

Good luck!

Upvotes: 4

Related Questions