No elements found when using BeatifulSoup find_all

Question

Hi community :)

I am a bit stuck with my project. I try to scrape the news from the websites https://mercomindia.com/category/solar/?_page=1 and https://www.pv-magazine.com/news/page/2/ with BeautifulSoup.

BS4 is working correctly, I used my code already for multiple other websites but in these two cases (especially pv-magazine is very valuable to me), I do not find any html-tags. I use the find_all to find the date tags and a href tags but my lists are always empty. I tried to change it from html5lib to htmlparser, tried to change the tags but nothing seems to work. Anyone got a clue, what's wrong?

Here's my code:

dates = []
news_bodies = []

# Capture the news from this url

scrape_url = 'https://mercomindia.com/category/solar/?_page=1'

# Parsing the HTML
r1 = requests.get(scrape_url)
page = r1.content

# Using Beatifulsoup to get the content of the correct HTML attributes
page_soup = soup(page, 'html.parser')

dates_soup = page_soup.find_all('span',{'class':'entry-date'})
titles_soup = page_soup.find_all('a',{'class':'_self cvplbd'})

for i in range (0, len(dates_soup)):
    corpus = ""
    time.sleep(.1) # Prevents site spam
    dates.append(dates_soup[i].time['text'])
    news_url = titles_soup[i]['href']
    r2 = requests.get(news_url)
    news_page = r2.content
    news_page_soup = soup(news_page, 'html.parser')
    news_text = news_page_soup.find_all('p')
    for news in news_text:
        if (("

bigbounty · Accepted Answer

You need to pass relevant headers in order to get proper html response. Pass User Agent in the headers.

import requests
from bs4 import BeautifulSoup as soup

dates = []
hrefs = []


# Capture the news from this url

scrape_url = 'https://mercomindia.com/category/solar/?_page=1'

headers = {'user-agent':"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"}

# Parsing the HTML
r1 = requests.get(scrape_url, headers=headers)
page = r1.content

# Using Beatifulsoup to get the content of the correct HTML attributes
page_soup = soup(page, 'html.parser')

divs = page_soup.find("div",{"data-id":"pt-cv-page-1"}).find_all("div", class_="pt-cv-content-item")

for div in divs:
    hrefs.append(div.find("a")["href"])
    dates.append(div.find("span",class_="entry-date").get_text(strip=True))

print(dates)
print("-"*10)
print(hrefs)

Output:

['August 13, 2020', 'August 13, 2020', 'August 13, 2020', 'August 13, 2020', 'August 13, 2020', 'August 13, 2020', 'August 13, 2020', 'August 12, 2020', 'August 12, 2020', 'August 12, 2020', 'August 12, 2020', 'August 11, 2020', 'August 11, 2020', 'August 10, 2020', 'August 8, 2020', 'August 7, 2020', 'August 7, 2020', 'August 7, 2020', 'August 7, 2020', 'August 7, 2020', 'August 6, 2020', 'August 6, 2020', 'August 6, 2020', 'August 6, 2020', 'August 6, 2020']
----------
['https://mercomindia.com/winners-gujarat-solar-auction/', 'https://mercomindia.com/bescom-issues-amendment/', 'https://mercomindia.com/ireland-lists-solar-wind-projects/', 'https://mercomindia.com/tata-power-registers-profits/', 'https://mercomindia.com/european-union-extends-countervailing-duty/', 'https://mercomindia.com/south-africa-sasol-invites-bids-solar/', 'https://mercomindia.com/power-finance-loans-solar-developer/', 'https://mercomindia.com/eesl-tender-solar-projects-maharashtra/', 'https://mercomindia.com/tender-reissued-25-mw/', 'https://mercomindia.com/gujarat-new-industrial-policy-solar/', 'https://mercomindia.com/interested-parties-solar-glass-imports/', 'https://mercomindia.com/eib-engie-off-grid-solar-uganda/', 'https://mercomindia.com/eesl-empanel-consultants-solar-projects/', 'https://mercomindia.com/reil-tenders-multicrystalline-solar-cells/', 'https://mercomindia.com/french-technique-solaire-expand-portfolio/', 'https://mercomindia.com/actis-acquires-solar-projects-acme/', 'https://mercomindia.com/renesola-power-raises-12-million/', 'https://mercomindia.com/andhra-waives-stamp-duty-solar/', 'https://mercomindia.com/cerc-approves-tariffs-solar-projects/', 'https://mercomindia.com/ayana-renewable-acquires-two-solar-projects/', 'https://mercomindia.com/amp-energy-tata-azure-o2-power-ntpc-solar-auction/', 'https://mercomindia.com/no-ists-charges-solar-wind-projects/', 'https://mercomindia.com/another-deadline-extension-renewable-power/', 'https://mercomindia.com/long-term-bcd-india-solar-roundtable/', 'https://mercomindia.com/central-electronics-bids-solar-ribbons/']

No elements found when using BeatifulSoup find_all

Answers (2)

Related Questions