Unable to extract item with beautifulsoup

Question

I'm trying to use beautiful soup to return the number of datasets there are on this website.

However, I'm not sure what is wrong with my code.

I can't seem to extract just the number of datasets. (datasets is 3908)

base_url = www.quandl.com/data/TSE
web_content = BeautifulSoup(requests.get(base_url).text, "html.parser")
for stats in web_content.findAll('table', attrs={'class'}):
     print(stats)

How should i structure my code?

MLSC · Accepted Answer

Try:

attrs={'class' : ''}

So you have:

from bs4 import BeautifulSoup
import requests
base_url = 'http://www.quandl.com/data/TSE'
web_content = BeautifulSoup(requests.get(base_url).text, "html.parser")
for stats in web_content.findAll('table', attrs={'class' : ''}):
     print(stats)

Note: If your target supports javascript, requests is not a good idea, You can Try PhantomJS instead.

Edit:

from lxml import html
import requests
base_url = 'http://www.quandl.com/data/TSE'
web_content = requests.get(base_url).text
tree = html.fromstring(web_content)
print tree.xpath('//tr/td/text()')[3]

Unable to extract item with beautifulsoup

Answers (1)

Related Questions