Reputation: 305
I am using the BeautifulSoup module to find the images and site links for different kinds of jelly fungus, write them to an html file, and display them to the user. Here is my code:
import os
import cfscrape
import webbrowser
from bs4 import BeautifulSoup
spider = cfscrape.CloudflareScraper()
#Creating a session.
with spider:
#Scraping the contents of the main page.
data = spider.get("https://en.wikipedia.org/wiki/Jelly_fungus").content
#Grabbing data on each of the types of jelly fungi.
soup = BeautifulSoup(data, "lxml")
ul_tags = soup.find_all("ul")
mushroom_hrefs = ul_tags[1]
#Creating list to store page links.
links = []
#Grabbing the page links for each jelly fungi, and appending them to the links list.
for mushroom in mushroom_hrefs.find_all("li"):
for link in mushroom.find_all("a", href=True):
links.append(link["href"])
#Creating list to store image links .
images = []
#Grabbing the image links from each jelly fungi's page, and appending them to the images list.
for i, link in enumerate(links, start=1):
link = "https://en.wikipedia.org/" + link
data = spider.get(link).content
soup = BeautifulSoup(data, "lxml")
fungus_info = soup.find("table", {"class": "infobox biota"})
print(i)
img = fungus_info.find("img")
images.append("https:" + img["src"])
#Checking for an existing html file, if there is one, delete it.
if os.path.isfile("fungus images.html"):
os.remove("fungus images.html")
#Iterating through the jelly fungi images and placing them accordingly in the html file.
for i, img in enumerate(images):
links[i] = "https://en.wikipedia.org" + links[i]
with open("fungus images.html", "a") as html:
if i == 0:
html.write(f"""
<DOCTYPE! html
<html>
<head>
<title>Fungus</title>
</head>
<body>
<h1>Fungus Images</h1>
<a href="{links[i]}">
<img src="{img}">
</a>
""")
elif i < len(images):
html.write(f"""
<a href="{links[i]}">
<img src="{img}">
</a>
""")
else:
html.write(f"""
<a href="{links[i]}">
<img src="{img}">
</a>
</body>
</html>
""")
webbrowser.open("fungus images.html")
On line 45, I begin iterating through each fungi's page in order to find the information table containing it's picture. This works well for the first 17 pages, but for some reason, returns a NoneType value on the Tremellodendron fungus. I don't know why this is happening, as it's table has the same class as the other fungi.
Upvotes: 0
Views: 56
Reputation: 774
The NoneType comes from the wikipedia page you are scraping. The red circle in this image shows you what your link is at the index you think your Tremellodendron fungus link is.
It's href is
#cite-note-3
which does not link to a wikipedia page per-se thus your scraping error.
Make sure your link points to a page and not a reference ;)
Upvotes: 1