Rahul
Rahul

Reputation: 214

Why reddit return 502 error when accessing a page using beautifulsoup4

I have written a small script to check the username in popular websites like Facebook, Instagram, etc. Here is the code.

import requests
from termcolor import colored, cprint 
from time import sleep
from bs4 import BeautifulSoup

status_code_html = 'https://en.wikipedia.org/wiki/List_of_HTTP_status_codes'
uname = input("Enter the username: ")
width = 10

websites = {
    'Facebook': 'https://www.facebook.com/',
    'Twitter': 'https://twitter.com/',
    'Instagram': 'https://www.instagram.com/',
    'Youtube': 'https://www.youtube.com/user/',
    'Reddit': 'https://www.reddit.com/user/'
}

for site, url in websites.items():

    try:
        response = requests.get(url+uname)
        page = requests.get(status_code_html)
        soup = BeautifulSoup(page.content, 'html.parser')
        tag = soup.find(id=response.status_code)
        status = tag.find_parent('dt').text
        response.raise_for_status()

    except:
        print(site.rjust(width), '   :', 'Fail'.ljust(width), '(Status:', status, ')')

    else:
        print(site.rjust(width), '   :', 'Success'.ljust(width), '(Status:', status, ')')

Output of the above code is

Enter the username: ********
Facebook    : Success    (Status: 200 OK )
   Twitter    : Success    (Status: 200 OK )
 Instagram    : Success    (Status: 200 OK )
   Youtube    : Success    (Status: 200 OK )
    Reddit    : Fail       (Status: 502 Bad Gateway )

This code works for all website except reddit.com. requests.get() return a 502 error page. Can someone help resolve this issue?

Upvotes: 1

Views: 927

Answers (1)

chitown88
chitown88

Reputation: 28630

Adding the the user agent in the headers parameter should fix that:

import requests
from termcolor import colored, cprint 
from time import sleep
from bs4 import BeautifulSoup

status_code_html = 'https://en.wikipedia.org/wiki/List_of_HTTP_status_codes'
uname = input("Enter the username: ")
width = 10

headers = {'user-agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Mobile Safari/537.36'}

websites = {
    'Facebook': 'https://www.facebook.com/',
    'Twitter': 'https://twitter.com/',
    'Instagram': 'https://www.instagram.com/',
    'Youtube': 'https://www.youtube.com/user/',
    'Reddit': 'https://www.reddit.com/user/'
}

for site, url in websites.items():

    try:
        response = requests.get(url+uname, headers=headers)
        page = requests.get(status_code_html)
        soup = BeautifulSoup(page.content, 'html.parser')
        tag = soup.find(id=response.status_code)
        status = tag.find_parent('dt').text
        response.raise_for_status()

    except:
        print(site.rjust(width), '   :', 'Fail'.ljust(width), '(Status:', status, ')')

    else:
        print(site.rjust(width), '   :', 'Success'.ljust(width), '(Status:', status, ')')

Upvotes: 3

Related Questions