jake wong
jake wong

Reputation: 5228

Unable to extract item with beautifulsoup

I'm trying to use beautiful soup to return the number of datasets there are on this website.

However, I'm not sure what is wrong with my code.

I can't seem to extract just the number of datasets. (datasets is 3908)

base_url = www.quandl.com/data/TSE
web_content = BeautifulSoup(requests.get(base_url).text, "html.parser")
for stats in web_content.findAll('table', attrs={'class'}):
     print(stats)

How should i structure my code?

Upvotes: 0

Views: 65

Answers (1)

MLSC
MLSC

Reputation: 5972

Try:

attrs={'class' : ''}

So you have:

from bs4 import BeautifulSoup
import requests
base_url = 'http://www.quandl.com/data/TSE'
web_content = BeautifulSoup(requests.get(base_url).text, "html.parser")
for stats in web_content.findAll('table', attrs={'class' : ''}):
     print(stats)

Note: If your target supports javascript, requests is not a good idea, You can Try PhantomJS instead.

Edit:

from lxml import html
import requests
base_url = 'http://www.quandl.com/data/TSE'
web_content = requests.get(base_url).text
tree = html.fromstring(web_content)
print tree.xpath('//tr/td/text()')[3]

Upvotes: 1

Related Questions