Reputation: 22440
I've written a script in python in combination with BeautiflSoup using selector to parse the movie names and its corresponding features from a webpage. When I execute my script, It fetches the required items partially. How can I get all the movie names and its fetaures?
I tried like this:
import requests
from bs4 import BeautifulSoup
from itertools import zip_longest
with requests.Session() as session:
r = session.get('https://yts.am/browse-movies')
soup = BeautifulSoup(r.text,"lxml")
items = {item.text:itm.text for item,itm in zip(soup.select(".browse-movie-title"),soup.select("figcaption h4"))}
print(items)
Results I'm having like:
{'Halloween H20: 20 Years Later': '5.7 / 10', 'Rabbit': 'Horror', and so on-----
I suppose it's because of the zip()
function. However, I imported zip_longest()
which might do the trick but I could't make use of it.
Html elements within which one such features of a single movie are:
<figcaption class="hidden-xs hidden-sm">
<span class="icon-star"></span>
<h4 class="rating">5.7 / 10</h4>
<h4>Horror</h4>
<h4>Thriller</h4>
<span class="button-green-download2-big">View Details</span>
</figcaption>
This is the relevant html for a single movie:
<div class="browse-movie-bottom">
<a href="https://yts.am/movie/halloween-h20-20-years-later-1998" class="browse-movie-title">Halloween H20: 20 Years Later</a>
<div class="browse-movie-year">1998</div>
</div>
Expected output for a single movie:
'Halloween H20: 20 Years Later': ['5.7 / 10','Horror','Thriller']
Upvotes: 0
Views: 36
Reputation: 1123
You are selecting all elements at once. It might be hard to group. Also zip
is not the thing you are looking for. Simply iterate through cards.
import requests
from bs4 import BeautifulSoup
with requests.Session() as session:
r = session.get('https://yts.am/browse-movies')
soup = BeautifulSoup(r.text,"lxml")
for movie in soup.select("div.browse-movie-wrap"):
title = movie.select_one('a.browse-movie-title').text
details = [detail.text for detail in movie.select('h4')]
print((title, details))
Output will be,
('Heavy Weights', ['6.7 / 10', 'Comedy', 'Drama'])
('Get Shorty', ['6.9 / 10', 'Comedy', 'Crime'])
('Fred Claus', ['5.6 / 10', 'Comedy', 'Family'])
("Free Willy: Escape from Pirate's Cove", ['5.2 / 10'])
('Halloween: Resurrection', ['4.1 / 10', 'Comedy', 'Horror'])
('Ant-Man and the Wasp', ['7.2 / 10', 'Action', 'Adventure'])
('Rabbit', ['6.2 / 10', 'Thriller'])
('Halloween H20: 20 Years Later', ['5.7 / 10', 'Horror', 'Thriller'])
("Madeline's Madeline", ['6.9 / 10'])
('Halloween 5', ['5.2 / 10'])
('Halloween: The Curse of Michael Myers', ['4.9 / 10', 'Action', 'Horror'])
('Deck the Halls', ['4.9 / 10', 'Comedy', 'Family'])
('Halloween 4: The Return of Michael Myers', ['5.9 / 10', 'Horror', 'Thriller'])
('Dark Horse', ['6 / 10', 'Action', 'Comedy'])
('Double Whammy', ['5.7 / 10', 'Comedy', 'Crime'])
('Beyond Borders', ['6.5 / 10', 'Adventure', 'Drama'])
('Dead Man Running', ['6 / 10', 'Action', 'Crime'])
('Cougar Hunting', ['3.7 / 10', 'Comedy', 'Romance'])
('Cabin Boy', ['5.2 / 10', 'Adventure', 'Comedy'])
('Illang: The Wolf Brigade', ['5.5 / 10', 'Action', 'Sci-Fi'])
Upvotes: 1