user7360021
user7360021

Reputation: 51

urlopen of urllib.request cannot open a page in python 3.7

I want to write webscraper to collect titles of articles from Medium.com webpage.

I am trying to write a python script that will scrape headlines from Medium.com website. I am using python 3.7 and imported urlopen from urllib.request. But it cannot open the site and shows

 "urllib.error.HTTPError: HTTP Error 403: Forbidden" error. 
from bs4 import BeautifulSoup
from urllib.request import  urlopen

webAdd = urlopen("https://medium.com/")
bsObj = BeautifulSoup(webAdd.read())
Result = urllib.error.HTTPError: HTTP Error 403: Forbidden

Expected result is that it will not show any error and just read the web site.

But this does not happen when I use requests module.

import requests 
from bs4 import BeautifulSoup 
url = 'https://medium.com/' 
response = requests.get(url, timeout=5)

This time around it works without error.

Why ??

Upvotes: 1

Views: 2521

Answers (3)

alex
alex

Reputation: 1917

this worked for me

import urllib 
from urllib.request import urlopen
html = urlopen(MY_URL)
contents = html.read()
print(contents)

Upvotes: 0

Nick H
Nick H

Reputation: 1079

Many sites nowadays check where the user agent is coming from, to try and deter bots. requests is the better module to use, but if you really want to use urllib, you can alter the headers text, to pretend to be Firefox or something else, so that it is not blocked. Quick example can be found here:

https://stackoverflow.com/a/16187955

import urllib.request

user_agent = 'Mozilla/5.0 (platform; rv:geckoversion) Gecko/geckotrail Firefox/firefoxversion'

url = "http://example.com"
request = urllib.request.Request(url)
request.add_header('User-Agent', user_agent)
response = urllib.request.urlopen(request)

You will need to alter the user_agent string with the appropriate versions of things too. Hope this helps.

Upvotes: 3

Murtaza Haji
Murtaza Haji

Reputation: 1193

Urllib is pretty old and small module. For webscraping, requests module is recommended. You can check out this answer for additional information.

Upvotes: 4

Related Questions