Reputation: 13
I'm starting to learn code some complex things on python and today I decided to work with BeautifulSoup. The problem appear when I tried to get the title of a product, I attempt to change ".find" to ".findAll" and can't find the solution. Someone please help me. Here is my code:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup
ListaSteam = "https://store.steampowered.com/search/?sort_by=Price_ASC&category1=998%2C996&category2=29"
#PAGINA - OBTENCION - CERRADA
Pagina = uReq(ListaSteam)
PaginaHtml = Pagina.read()
Pagina.close()
#1 PASO
PaginaSoup = Soup(PaginaHtml, "html.parser")
CodigoJuegos = PaginaSoup.find("div",{"id":"search_resultsRows"})
PRUEBA = CodigoJuegos.a.span["title"]
print(PRUEBA)
The error is as follows:
This is the error:
`Traceback (most recent call last):
File "C:\Users\Usuario\Desktop\******", line 14, in <module>
PRUEBA = CodigoJuegos.a.span["title"]
File "C:\Users\Usuario\AppData\Local\Programs\Python\Python39\lib\site-packages\bs4\element.py", line 1406, in __getitem__
return self.attrs[key]
KeyError: 'title'
Upvotes: 0
Views: 392
Reputation: 9405
First of, you should use PEP8 styling. It is hard to read your code.
If you want to solve it with the least amount of code change do the following:
PRUEBA = CodigoJuegos.a.span.text
That said, I scrape websites professionally using (among other tools bs4), and I'd go for something like this:
import requests
from bs4 import BeautifulSoup
search_url = "https://store.steampowered.com/search"
category1 = ('998', '996')
category2 = '29'
params = {
'sort_by': 'Price_ASC',
'category1': ','.join(category1),
'category2': category2,
}
response = requests.get(
search_url,
params=params
)
soup = BeautifulSoup(response.text, "html.parser")
elms = soup.find_all("span", {"class": "title"})
for elm in elms:
print(elm.text)
Output:
Barro F
The Last Hope: Trump vs Mafia - North Korea
Ninja Stealth
Tetropunk
Oracle
Legend of Himari
Planes, Bullets and Vodka
Shift
Blast-off
...
If you already have a dependency to bs4
, you might as well also get requests
too.
Upvotes: 1
Reputation: 2619
using css selector 'spna.title'
CodigoJuegos = PaginaSoup.select('span.title')
for t in CodigoJuegos:
print(t.text)
Upvotes: 0
Reputation: 1088
May be you want to do:
PRUEBA = CodigoJuegos.a.get_text("title")
Upvotes: 0