Reputation: 11
I'm using beautiful soup to scrape a site.
Code:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
my_url = 'https://www.bewakoof.com/biker-t-shirts'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class": "productGrid"})
print(len(containers))
I am getting below mentioned error.
Error
o = containerClass(current_data)
TypeError: __init__() takes 1 positional argument but 2 were given
Upvotes: 1
Views: 189
Reputation: 218
When I tryed to run part of yours code I've catch an error:
After that i've try to use requests:
>>> my_url = 'https://www.bewakoof.com/biker-t-shirts'
>>> import requests as re
>>> r = re.get(my_url)
>>> r
<Response [403]>
You have got code 403 - it means that the server understood the request but refuses to authorize it. You can get more information about that here
Most often, this error is associated with primitive protection from parsers. To solve this, use this method: You must use headers to deceive
the site that you are a browser
To do this download requests lib
then create a dict
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
Instead of these values you can substitute your own. The easiest way to do this is with Network Analiser in your browser (press F12 in Chrome)
Then
import requests as req
url = "url"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
r = req.get(url, headers)
But in this situation, the problem is different. The site you are trying to access simply does not work:
Upvotes: 1