Reputation: 11

web scraping using beautiful soup

I'm using beautiful soup to scrape a site.

Code:

    from bs4 import BeautifulSoup as soup
    
    from urllib.request import urlopen as uReq
    my_url = 'https://www.bewakoof.com/biker-t-shirts'
    uClient = uReq(my_url)
    
    
    page_html = uClient.read()
    uClient.close()
    page_soup = soup(page_html, "html.parser")
    
    containers = page_soup.findAll("div", {"class": "productGrid"})
    
    print(len(containers))

I am getting below mentioned error.

Error

o = containerClass(current_data)
TypeError: __init__() takes 1 positional argument but 2 were given

Upvotes: 1

Answers (1)

BlueScreen

Reputation: 218

When I tryed to run part of yours code I've catch an error:

After that i've try to use requests:

>>> my_url = 'https://www.bewakoof.com/biker-t-shirts'
>>> import requests as re
>>> r = re.get(my_url)
>>> r
<Response [403]>

You have got code 403 - it means that the server understood the request but refuses to authorize it. You can get more information about that here

Most often, this error is associated with primitive protection from parsers. To solve this, use this method: You must use headers to deceive the site that you are a browser To do this download requests lib then create a dict

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}

Instead of these values you can substitute your own. The easiest way to do this is with Network Analiser in your browser (press F12 in Chrome)

Then

import requests as req
url = "url"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
r = req.get(url, headers)

But in this situation, the problem is different. The site you are trying to access simply does not work:

Upvotes: 1

web scraping using beautiful soup

Answers (1)

Related Questions