J.doe
J.doe

Reputation: 21

How to change user agent urllib2

I'm trying to access a page using the following

page = urllib2.urlopen(full_url)
soup = BeautifulSoup(page, 'html.parser')

li_post_id = "post-" + str(post_id)
li_soup = soup.find('li', attrs={'id':li_post_id})

This works fine on my ubuntu machine, but when running it on my Windows server I get 403 Forbidden error, so I assume the issue is with the user agent.

How do I change this, say, to Firefox? I have only seen tutorials to change the user agent using requests, but I don't want to change all of my code to this.

Upvotes: 2

Views: 1913

Answers (2)

Evya
Evya

Reputation: 2375

Changing the header doesn't have anything to do with BeautifulSoup. It is meant for HTML parsing only. You need to change it in your urllib request like so:

Python3

import urllib.request

req = urllib.request.build_opener()
req.addheaders = [('User-Agent', 'Some user agent')]
response = req.open('http://www.stackoverflow.com')

Python2.7

import urllib2

req = urllib2.build_opener()
req.addheaders = [('User-Agent', 'Some user agent')]
response = req.open('http://www.stackoverflow.com')

Upvotes: 1

Orhan Solak
Orhan Solak

Reputation: 809

You could try this.

import random
import requests, bs4


agents= [
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)']

headers = {"User-Agent":random.choice(agents)}
response = requests.get(full_url,headers=headers)
soup = BeautifulSoup(response.text, 'lxml')

Upvotes: 1

Related Questions