Reputation: 3
I want to scrape the web with Python and I am running into some problems. Here is my code:
from urllib import request
from bs4 import BeautifulSoup
pageURL="https://gamesnacks.com/embed/games/omnomrun"
rawPage=request.urlopen(pageURL)
soup=BeautifulSoup(rawPage, "html5lib")
content=soup.article
linksList=[]
for link in content.find_all('a'):
url=link.get("href")
img=link.get("src")
text=link.span.text
linksList.append({"url":"url","img":"img","text":"text"})
try:
url=link.get("href")
img=link.get("src")
text=link.span.text
linksList.append({"url":"url","img":"img","text":"text"})
except AttributeError:
pass
import json
with open("links.json","w",encoding="utf-8") as links_file:
json.dump(linksList,links_file,ensure_ascii=False)
print("the work is done")
It gives an error in
for link in content.find_all('a'):
I have already tried some online help but it didn't work out.
Upvotes: -1
Views: 76
Reputation: 12179
You define content
as soup.article
but the article
is just None
, therefore you encounter this error:
Traceback (most recent call last):
File "main.py", line 14, in <module>
for link in content.find_all('a'):
AttributeError: 'NoneType' object has no attribute 'find_all'
because None
itself isn't a BeautifulSoup object so it won't have any of its methods such as find_all()
.
You need to find a better place for retrieval of the article
whatever that should be.
Try to use soup.find_all("article")
, then iterate through it. Perhaps your website contains multiple article
tags, however, judging by visiting of the website and checking its source I don't see any <article>
tag anywhere which would be the reason there's no article
attribute if it were only a single occurrence and would most likely not return anything useful even with find_all("article")
.
Upvotes: 1