BeautifulSoup "find" method returning NoneType inexplicably

Question

I am using the BeautifulSoup module to find the images and site links for different kinds of jelly fungus, write them to an html file, and display them to the user. Here is my code:

import os
import cfscrape
import webbrowser
from bs4 import BeautifulSoup

spider = cfscrape.CloudflareScraper()

#Creating a session.
with spider:
    #Scraping the contents of the main page.
    data = spider.get("https://en.wikipedia.org/wiki/Jelly_fungus").content

    #Grabbing data on each of the types of jelly fungi.
    soup = BeautifulSoup(data, "lxml")
    ul_tags = soup.find_all("ul")
    mushroom_hrefs = ul_tags[1]

    #Creating list to store page links.
    links = []

    #Grabbing the page links for each jelly fungi, and appending them to the links list.
    for mushroom in mushroom_hrefs.find_all("li"):
        for link in mushroom.find_all("a", href=True):
            links.append(link["href"])

    #Creating list to store image links    .
    images = []

    #Grabbing the image links from each jelly fungi's page, and appending them to the images list.
    for i, link in enumerate(links, start=1):
        link = "https://en.wikipedia.org/" + link
        data = spider.get(link).content

        soup = BeautifulSoup(data, "lxml")
        fungus_info = soup.find("table", {"class": "infobox biota"})
        print(i)

        img = fungus_info.find("img")
        images.append("https:" + img["src"])

#Checking for an existing html file, if there is one, delete it.
if os.path.isfile("fungus images.html"):
    os.remove("fungus images.html")

#Iterating through the jelly fungi images and placing them accordingly in the html file.
for i, img in enumerate(images):
    links[i] = "https://en.wikipedia.org" + links[i]
    with open("fungus images.html", "a") as html:
        if i == 0:
            html.write(f"""


Fungus


Fungus Images



            """)

        elif i < len(images):
            html.write(f"""



            """)

        else:
            html.write(f"""





            """)

webbrowser.open("fungus images.html")

On line 45, I begin iterating through each fungi's page in order to find the information table containing it's picture. This works well for the first 17 pages, but for some reason, returns a NoneType value on the Tremellodendron fungus. I don't know why this is happening, as it's table has the same class as the other fungi.

michmich112 · Accepted Answer

The NoneType comes from the wikipedia page you are scraping. The red circle in this image shows you what your link is at the index you think your Tremellodendron fungus link is. It's href is #cite-note-3 which does not link to a wikipedia page per-se thus your scraping error. Make sure your link points to a page and not a reference ;)

BeautifulSoup "find" method returning NoneType inexplicably

Answers (1)

Related Questions

BeautifulSoup &quot;find&quot; method returning NoneType inexplicably

Answers (1)

Related Questions

BeautifulSoup "find" method returning NoneType inexplicably