Reputation: 11
Im trying to webscrape all snapple facts on https://www.snapple.com/real-facts right now, and since I didnt find anything useful online, I decided to write my own script
from bs4 import BeautifulSoup as soup
import requests
data = requests.get('https://www.snapple.com/real-facts')
result_list = []
soup = soup(data.text, 'html.parser')
divs = soup.find("div", {'id':'facts'})
for div in divs:
fact_li = div.find('li')
for fact in fact_li:
spans = fact.find('span', {'class':'description'})
for span in spans:
a = fact.find('a')
result_list.append(a)
print(result_list)
when I run this it returns:
Traceback (most recent call last):
File "snapplefactscrape.py", line 11, in <module>
for fact in fact_li:
TypeError: 'int' object is not iterable
I get what that means, but I dont understand why the fact_li is an int, and how I can prevent it from being one.
Help would be appreciated :)
Upvotes: 0
Views: 88
Reputation: 12255
To get all elements use find_all
instead of find
.
You don't need to use 3 loops to get all links, using select
with #facts .description a
css selector will give you them:
base_url = 'https://www.snapple.com'
data = requests.get(f'{base_url}/real-facts')
soup = soup(data.text, 'html.parser')
links = soup.select('#facts .description a')
for link in links:
print(link.text, base_url + link['href'])
But if you want to use loops:
divs = soup.find_all('div', {'id': 'facts'})
for div in divs:
fact_li = div.find_all('li')
for fact in fact_li:
spans = fact.find_all('span', {'class': 'description'})
for span in spans:
a = fact.find_all('a')
result_list.append(a)
Upvotes: 1
Reputation: 606
When iterating for div in divs:
div becomes a string. So instead of the bs4 find method on tags, you´re using the find method on strings, which returns -1 if the substring is not found.
IN the first iteration for example, the value of div is "\n". This would be a good example for using a debugger to check the value of variables. Or even use print for value and type checks.
Upvotes: 1