mHelpMe
mHelpMe

Reputation: 6668

scraping table data from a webpage

I am trying to learn python and Portuguese so thought I could kill two birds with one stone.

Here is an example of one of the pages. I want to download the data that is in the blue tables, so the first such table is called Presente the next table is called Pretérito Perfeito and so on.

Below is my code however I'm struggling. My results variable does contain the data I need however trying to pull out the exact bit is beyond me as the div tags don't have id's.

Is there a better way to do this?

 import requests
 from bs4 import BeautifulSoup

 URL = 'https://conjugator.reverso.net/conjugation-portuguese-verb-ser.html'
 page = requests.get(URL)
 soup = BeautifulSoup(page.content, 'html.parser')
 results = soup.find(id='ch_divSimple')
 mychk = results.prettify()
 tbl_elems = results.find_all('section', class_='wrap-verbs-listing')

Upvotes: 0

Views: 50

Answers (2)

XavierBrt
XavierBrt

Reputation: 1249

They don't have ids but they have classes. You can do:

results.find_all("div", "blue-box-wrap")

Where blue-box-wrap is a class.

It will return a ResultSet object of length 22, as there are 22 blue tables. You can select the one you want with indexing, like this for the first one:

blue_tables = results.find_all("div", "blue-box-wrap")
blue_tables[0]

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

Replace:

 results = soup.find(id='ch_divSimple')
 mychk = results.prettify()
 tbl_elems = results.find_all('section', class_='wrap-verbs-listing')

With:

results = soup.find("div", attrs={"class": 'blue-box-wrap'})
tbl_elems = results.find_all('ul', class_='wrap-verbs-listing')

Upvotes: 1

Related Questions