Python Data Scraping - Extracting lines in the table where the tag '' exists

Question

I have been working on web scraping and have gotten pretty far in preparing my table from the web page I am scraping from.

The problem is that I can't get past getting the entries which only contain the data (lines which start with '< td >'). My code is as follows:

url = requests.get('https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods')

soup = BeautifulSoup(url.text,'lxml')
print(soup.prettify())

table_classes = {'class':'sortable'}
raw_table = soup.findAll("table", table_classes)
print(raw_table)

Putting the nest line of code causes the error 'ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()':

td_tags = raw_table.find_all('')
td_tags

Looking at the data type I then tried to use find() and it still caused the same error, so I then tried looping over each line with the following code:

for line in raw_table:
    if line.get_text().find('') > -1:
        line

When I run this loop, nothing happens. if I put it outside of the 'if' loop then it just returns every line in the table 'Canada_table_raw'

How can I get the entries with the '' tag so that I can then put the results into a pandas data frame?

Edeki Okoh · Accepted Answer

You are missing one piece of code to get the parser to run.

url = requests.get(
    'https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods')

soup = BeautifulSoup(url.text, 'lxml')

table_classes = {'class': 'sortable'}
raw_table = soup.findAll("table", table_classes)
#print(raw_table)
for td in raw_table:
    print(td.findAll('td'))

As the error code says. You are returning a ResultSet Object. So you need to iterate over the object to get the specific elements that you need. In this case we are returning all of the td elements that are in the ResultsSet with the following output:

[Toronto CMA Average
, 
, All
, 5,113,149
, 5903.63
, 866
, 9.0
, 40,704
, 10.6
, 11.4
, 
, 
, 
,

Python Data Scraping - Extracting lines in the table where the tag '<td>' exists

Answers (2)

Related Questions

Python Data Scraping - Extracting lines in the table where the tag &#39;&lt;td&gt;&#39; exists

Answers (2)

Related Questions

Python Data Scraping - Extracting lines in the table where the tag '<td>' exists