Can't find existing element with beautifulsoup and requests

Question

When I want to scrape all data from https://www.britannica.com/search?query=world+war+2 I can't find all the elements. I am specifically looking for everything inside the div element with the class: search-feature-container (it's the content inside the info box at the top) , but when I scrape it just says that it found None. This is my code:

import requests
from bs4 import BeautifulSoup

def scrape_britannica(product_name):
    ### SETUP ###
    URL_raw = 'https://www.britannica.com/search?query=' + product_name
    URL = URL_raw.strip().replace(" ", "+")
    ## gets the html from the url
    try:
        page = requests.get(URL)
    except:
        print("Could not find URL..")

    ## a way to come around scrape blocking
    soup = BeautifulSoup(page.content, 'html.parser')

    parent = soup.find("div", {"class": "search-feature-container"})
    print(parent)

scrape_britannica('carl barks')

I guess it has something to do with it not loading in the beginning when you open the page but I still don't know how to fix it. Or maybe it's cause the website is using Cookies. I'm literally looking for all the ideas I can get! Thx :D

jose_bacoy · Accepted Answer

I would find all tags: script and check if there is a keyword: featuredSearchTopic in it. Then I will convert the text into json (as a dictionary) then access the data 'description'.

import requests
from bs4 import BeautifulSoup
import json

def scrape_britannica(product_name):
    ### SETUP ###
    URL_raw = 'https://www.britannica.com/search?query=' + product_name
    URL = URL_raw.strip().replace(" ", "+")
    ## gets the html from the url
    try:
        page = requests.get(URL)
    except:
        print("Could not find URL..")

    ## a way to come around scrape blocking
    soup = BeautifulSoup(page.content, 'html.parser')
    #print(soup)

    for parent in soup.findAll("script"):  #, {"class": "search-feature-container"})
        if 'featuredSearchTopic' in str(parent):
            txt = json.loads(parent.text.replace(';','').split('=')[-1])
            print(txt.get('topicInfo').get('description'))


scrape_britannica('carl barks')

Result:

comic strip: Institutionalization: …Disney artists of them all, Carl Barks, sole creator of more than 500 of the best Donald Duck and other stories, was rescued from the oblivion to which the Disney policy of anonymity would consign him to become a cult figure. His Collected Works ran to 30 luxurious folio volumes.…...

Can't find existing element with beautifulsoup and requests

Answers (2)

Related Questions

Can&#39;t find existing element with beautifulsoup and requests

Answers (2)

Related Questions

Can't find existing element with beautifulsoup and requests