Reputation: 29
hello everyone hope you could help me :) .
I am trying to get a tag of from a site: https://www.empireonline.com/movies/features/best-movies-2/
it seems that the request library wont get the whole data of the html.
after many tries i decided to make a text file of the html that i was trying to work with to see if there is an h3 tag when i was requesting the site.
my_url = "https://www.empireonline.com/movies/features/best-movies-2/"
from bs4 import BeautifulSoup
import requests
site_response = requests.get(my_url)
with open("site_file.txt", "w", encoding="utf-8") as file:
file.writelines(site_response.text)
after i looked into the text file that i created there is no h3 tag inside.
it is there when i inspect the site in dev tools in chrome.
here are some tries that i did to get the data:
soup.find_all('h3')
soup.find_all('div', {'class': 'jsx-4245974604')
and many many more different ways
hope someone can help me please.
Upvotes: 1
Views: 151
Reputation: 564
You need to render the page, you can do it with requests-html library:
from requests_html import HTMLSession
session = HTMLSession()
url = 'https://www.empireonline.com/movies/features/best-movies-2/'
r = session.get(url)
r.html.render()
titles = r.html.find('h3')
for t in titles:
print(t.text)
Same output as Andrej Kesely
Upvotes: 0
Reputation: 195408
The image titles are embedded in Json form within the page, so BeautifulSoup doesn't see them. You can use json
module to parse it:
import json
import requests
from bs4 import BeautifulSoup
url = "https://www.empireonline.com/movies/features/best-movies-2/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])
# uncomment to print all data:
# print(json.dumps(data, indent=4))
def find(d):
if isinstance(d, dict):
for k, v in d.items():
if k.startswith("ImageMeta:"):
yield v
else:
yield from find(v)
elif isinstance(d, list):
for v in d:
yield from find(v)
for d in find(data):
print(d["titleText"])
Prints:
100) Stand By Me
99) Raging Bull
98) Amelie
97) Titanic
96) Good Will Hunting
95) Arrival
94) Lost In Translation
2) The Princess Bride
92) The Terminator
91) The Prestige
90) No Country For Old Men
89) Shaun Of The Dead
88) The Exorcist
87) Predator
86) Indiana Jones And The Last Crusade
85) Léon
84) Rocky
83) True Romance
82) Some Like It Hot
81) The Social Network
15) Spirited Away
79) Captain America: Civil War
78) Oldboy
77) Toy Story
76) A Clockwork Orange
75) Fargo
74) Mulholland Dr.
73) Seven Samurai
72) Rear Window
71) Hot Fuzz
70) The Lion King
69) Singin' In The Rain
68) Ghostbusters
67) Memento
66) Return Of The Jedi
65) Avengers Assemble
64) L.A. Confidential
63) Donnie Darko
62) La La Land
61) Forrest Gump
60) American Beauty
59) E.T. – The Extra Terrestrial
58) Inglourious Basterds
57) Whiplash
56) Reservoir Dogs
55) Pan's Labyrinth
54) Vertigo
53) Psycho
52) Once Upon A Time In The West
51) It's A Wonderful Life
50) Lawrence Of Arabia
Trainspotting
48) The Silence Of The Lambs
47) Interstellar
46) Citizen Kane
45) Drive
44) Gladiator
43) One Flew Over The Cuckoo's Nest
42) There Will Be Blood
41) Eternal Sunshine Of The Spotless Mind
40) 12 Angry Men
39) Saving Private Ryan
38) Mad Max: Fury Road
37) The Thing
36) The Departed
35) The Shining
34) Guardians Of The Galaxy
33) Schindler's List
32) The Usual Suspects
31) Taxi Driver
30) Seven
29) The Big Lebowski
28) Casablanca
27) The Good, The Bad And The Ugly
26) Heat
25) Terminator 2: Judgment Day
24) The Matrix
23) The Lord Of The Rings: The Two Towers
22) Apocalypse Now
21) 2001: A Space Odyssey
20) Die Hard
19) Jurassic Park
18) Inception
17) Fight Club
16) The Lord Of The Rings: The Return Of The King
15) Aliens
14) Alien
13) Blade Runner
12: The Godfather Part II
11) Back To The Future
10) The Lord Of The Rings: The Fellowship Of The Ring
9) Star Wars
8) Jaws
7) Raiders Of The Lost Ark
6) Goodfellas
5) Pulp Fiction
4) The Shawshank Redemption
3) The Dark Knight
2) The Empire Strikes Back
The Godfather
Upvotes: 1