Reputation: 2334
I wanted to write a piece of code like the following:
from bs4 import BeautifulSoup
import urllib2
url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)
But I found that I have to install urllib3
package now.
Moreover, I couldn't find any tutorial or example to understand how to rewrite the above code, for example, urllib3
does not have urlopen
.
Any explanation or example, please?!
P/S: I'm using python 3.4.
Upvotes: 74
Views: 182964
Reputation: 1340
You should use urllib.reuqest, not urllib3.
import urllib.request # not urllib - important!
urllib.request.urlopen('https://...')
Upvotes: -1
Reputation: 115
In urlip3 there's no .urlopen
, instead try this:
import requests
html = requests.get(url)
Upvotes: 0
Reputation: 10538
With gazpacho you could pipeline the page straight into a parse-able soup object:
from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)
And run finds on top of it:
soup.find("div")
Upvotes: 0
Reputation: 473863
You do not have to install urllib3
. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup
. The choice is though usually requests
because of the rich feature set and convenient API. You can install requests
by entering pip install requests
in the command line. Here is a basic example:
from bs4 import BeautifulSoup
import requests
url = "url"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
Upvotes: 40
Reputation: 158
The new urllib3 library has a nice documentation here
In order to get your desired result you shuld follow that:
Import urllib3
from bs4 import BeautifulSoup
url = 'http://www.thefamouspeople.com/singers.php'
http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))
The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.
Source: User Guide
Upvotes: 13
Reputation: 18157
urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/
If you'd like to use urllib3, you'll need to pip install urllib3
. A basic example looks like this:
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)
Upvotes: 62