Piyush Ghasiya
Piyush Ghasiya

Reputation: 515

Web-scraping code using selenium and beautifulsoup not working properly

I wrote the python code for web-scraping Sydney morning herald newspaper. This code first clicks all the show more button and then scrape all the articles. Selenium part is working correctly. But I think there is some problem in the scraping part, as after scraping the desired fields (date, title, and content)for few articles (5-6) it is only giving date and title, no content.

import time
import csv
import requests
from bs4 import BeautifulSoup
from bs4.element import Tag
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

base = 'https://www.smh.com.au'
browser = webdriver.Safari(executable_path='/usr/bin/safaridriver')
wait = WebDriverWait(browser, 10)
browser.get('https://www.smh.com.au/search?text=cybersecurity')

while True:
    try:
        time.sleep(2)
        show_more = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, '_3we9i')))
        show_more.click()
    except Exception as e:
            print(e)
            break

soup = BeautifulSoup(browser.page_source,'lxml')
anchors = soup.find_all('a', {'tabindex': '-1'})
for anchor in anchors:
    browser.get(base + anchor['href'])
    sub_soup = BeautifulSoup(browser.page_source, 'html.parser')
    dateTag = sub_soup.find('time', {'class': '_2_zR-'})
    titleTag = sub_soup.find('h1', {'itemprop': 'headline'})
    contentTag = sub_soup.find_all('div', {'class': '_1665V undefined'})

    date = None
    title = None
    content = None

    if isinstance(dateTag, Tag):
        date = dateTag.get_text().strip()

    if isinstance(titleTag, Tag):
        title = titleTag.get_text().strip()

    if isinstance(contentTag, list):
        content = []
        for c in contentTag:
            content.append(c.get_text().strip())
        content = ' '.join(content)

    print(f'{date}\n {title}\n {content}\n')

    time.sleep(3)  


browser.close()

Why did this code stop giving content part after a few articles? I don't understand it.

Thank you.

Upvotes: 0

Views: 204

Answers (1)

Maaz
Maaz

Reputation: 2445

It's because You've reached your monthly free access limit It's the message displayed on the webpage after a few page displayed.

Upvotes: 2

Related Questions