CODE_BRV
CODE_BRV

Reputation: 3

Issues Scraping the Web with Python

I want to scrape the web with Python and I am running into some problems. Here is my code:

from urllib import request
from bs4 import BeautifulSoup

pageURL="https://gamesnacks.com/embed/games/omnomrun"
rawPage=request.urlopen(pageURL)

soup=BeautifulSoup(rawPage, "html5lib")

content=soup.article

linksList=[]


for link in content.find_all('a'):
    url=link.get("href")
    img=link.get("src")
    text=link.span.text

linksList.append({"url":"url","img":"img","text":"text"})

try:
    url=link.get("href")
    img=link.get("src")
    text=link.span.text
    linksList.append({"url":"url","img":"img","text":"text"})
except AttributeError:
    pass

import json

with open("links.json","w",encoding="utf-8") as links_file:
    json.dump(linksList,links_file,ensure_ascii=False)

print("the work is done")

It gives an error in for link in content.find_all('a'):

I have already tried some online help but it didn't work out.

Upvotes: -1

Views: 76

Answers (1)

Peter Badida
Peter Badida

Reputation: 12179

You define content as soup.article but the article is just None, therefore you encounter this error:

Traceback (most recent call last):
  File "main.py", line 14, in <module>
    for link in content.find_all('a'):
AttributeError: 'NoneType' object has no attribute 'find_all'

because None itself isn't a BeautifulSoup object so it won't have any of its methods such as find_all().

You need to find a better place for retrieval of the article whatever that should be.

Try to use soup.find_all("article"), then iterate through it. Perhaps your website contains multiple article tags, however, judging by visiting of the website and checking its source I don't see any <article> tag anywhere which would be the reason there's no article attribute if it were only a single occurrence and would most likely not return anything useful even with find_all("article").

Upvotes: 1

Related Questions