Reputation: 135
I am trying to automate the process of obtaining the number of followers different twitter accounts using the page source.
I have the following code for one account
from bs4 import BeautifulSoup
import requests
username='justinbieber'
url = 'https://www.twitter.com/'+username
r = requests.get(url)
soup = BeautifulSoup(r.content)
for tag in soup.findAll('a'):
if tag.has_key('class'):
if tag['class'] == 'ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav u-textUserColor':
if tag['href'] == '/justinbieber/followers':
print tag.title
break
I am not sure where did I went wrong. I understand that we can use Twitter API to obtain the number of followers. However, I wish to try to obtain it through this method as well to try it out. Any suggestions?
I've modified the code from here
Upvotes: 1
Views: 2464
Reputation: 10090
If I were you, I'd be passing the class name as an argument to the find()
function instead of find_all()
and I'd first look for the <li>
element that contains the anchor you're loooking for. It'd look something like this
from bs4 import BeautifulSoup
import requests
username='justinbieber'
url = 'https://www.twitter.com/'+username
r = requests.get(url)
soup = BeautifulSoup(r.content)
f = soup.find('li', class_="ProfileNav-item--followers")
title = f.find('a')['title']
print title
# 81,346,708 Followers
num_followers = int(title.split(' ')[0].replace(',',''))
print num_followers
# 81346708
PS findAll()
was renamed to find_all()
in bs4
Upvotes: 2