Reputation: 1246
I'm trying to scrape certain pieces of HTML data from certain websites, but I can't seem to scrape the parts I want. For instance I set myself the challenge of scraping the number of followers from this blog, but I can't seem to do so.
I've tried using urllib, request, beautifulsoup as well as Jam API.
Here's what my code looks like at the moment:
from bs4 import BeautifulSoup
from urllib import urlopen
import json
import urllib2
html = urlopen('http://freelegalconsultancy.blogspot.co.uk/')
soup = BeautifulSoup(html, "lxml")
print soup
How would I go about pulling the number of followers in this instace?
Upvotes: 0
Views: 84
Reputation: 8006
You can't grab the followers as it's a widget loaded by javascript. You need to grab parts of the html by css class or id or by the element.
E.g:
from bs4 import BeautifulSoup
from urllib import urlopen
html = urlopen('http://freelegalconsultancy.blogspot.co.uk/')
soup = BeautifulSoup(html)
assert soup.h1.string == '\nLAW FOR ALL-M.MURALI MOHAN\n'
Upvotes: 1