scraping table data from a webpage

Question

I am trying to learn python and Portuguese so thought I could kill two birds with one stone.

Here is an example of one of the pages. I want to download the data that is in the blue tables, so the first such table is called Presente the next table is called Pretérito Perfeito and so on.

Below is my code however I'm struggling. My results variable does contain the data I need however trying to pull out the exact bit is beyond me as the div tags don't have id's.

Is there a better way to do this?

 import requests
 from bs4 import BeautifulSoup

 URL = 'https://conjugator.reverso.net/conjugation-portuguese-verb-ser.html'
 page = requests.get(URL)
 soup = BeautifulSoup(page.content, 'html.parser')
 results = soup.find(id='ch_divSimple')
 mychk = results.prettify()
 tbl_elems = results.find_all('section', class_='wrap-verbs-listing')

XavierBrt · Accepted Answer

They don't have ids but they have classes. You can do:

results.find_all("div", "blue-box-wrap")

Where blue-box-wrap is a class.

It will return a ResultSet object of length 22, as there are 22 blue tables. You can select the one you want with indexing, like this for the first one:

blue_tables = results.find_all("div", "blue-box-wrap")
blue_tables[0]

scraping table data from a webpage

Answers (2)

Related Questions