amiref
amiref

Reputation: 3431

Scraping a webpage with embedded tweet

I am trying to scrape a web page which has an embedded tweet https://thehill.com/homenews/news/376608-west-virginia-teachers-to-continue-strike-after-state-senate-passes-lower-raise. When I use inspect element from my browser, it shows the corresponding HTML element to the embedded tweet, but when I search it through page resource or use beautifullSoup.findAll(), they do not return any result. How can I fix this problem?

Upvotes: 0

Views: 181

Answers (1)

chitown88
chitown88

Reputation: 28565

It's dynamic which means you'll need to use something like Selenium to render the page before pulling it. The link however is in the original html source with part of the tweet, so you could maybe go after that:

import requests
from bs4 import BeautifulSoup


url = 'https://thehill.com/homenews/news/376608-west-virginia-teachers-to-continue-strike-after-state-senate-passes-lower-raise'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}


response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

tweets = soup.find_all('blockquote',{'class':'twitter-tweet'})
for tweet in tweets:
        tweet_link = tweet.find('a')['href']
        print (tweet_link)

Upvotes: 1

Related Questions