Can't scrape
tag from page

Question

Seems like i can scrape any tag and class, except h3 on this page. It keeps returning None or an empty list. I'm trying to get this h3 tag:

page source with highlighted target tag

...on the following webpage:

https://www.empireonline.com/movies/features/best-movies-2/

And this is the code I use:

from bs4 import BeautifulSoup
import requests
from pprint import pprint
from bs4 import BeautifulSoup

URL = "https://www.empireonline.com/movies/features/best-movies-2/"

response = requests.get(URL)
web_html = response.text

soup = BeautifulSoup(web_html, "html.parser")

movies = soup.findAll(name = "h3" , class_ = "jsx-4245974604")

movies_text=[]

for item in movies:
    result = item.getText()
    movies_text.append(result)

print(movies_text)

Can you please help with the solution for this problem?

David · Accepted Answer

As other people mentioned this is dynamic content, which needs to be generated first when opening/running the webpage. Therefore you can't find the class "jsx-4245974604" with BS4.

If you print out your "soup" variable you actually can see that you won't find it. But if simply you want to get the names of the movies you can just use another part of the html in this case.

The movie name is in the alt tag of the picture (and actually also in many other parts of the html).

import requests

from pprint import pprint

from bs4 import BeautifulSoup

URL = "https://www.empireonline.com/movies/features/best-movies-2/"

response = requests.get(URL) 
web_html = response.text

soup = BeautifulSoup(web_html, "html.parser")


movies = soup.findAll("img", class_="jsx-952983560")

movies_text=[]

for item in movies: 
  result = item.get('alt')
  movies_text.append(result)

print(movies_text)

If you run into this issue in the future, remember to just print out the initial html you can get with soup and just check by eye if the information you need can be found.

Can't scrape <h3> tag from page

Answers (2)

Related Questions

Can&#39;t scrape &lt;h3&gt; tag from page

Answers (2)

Related Questions

Can't scrape <h3> tag from page