prosseek
prosseek

Reputation: 190679

Using urllib and BeautifulSoup to retrieve info from web with Python

I can get the html page using urllib, and use BeautifulSoup to parse the html page, and it looks like that I have to generate file to be read from BeautifulSoup.

import urllib                                       
sock = urllib.urlopen("http://SOMEWHERE") 
htmlSource = sock.read()                            
sock.close()                                        
--> write to file

Is there a way to call BeautifulSoup without generating file from urllib?

Upvotes: 13

Views: 30065

Answers (2)

emehex
emehex

Reputation: 10528

You could open the url, download the html, and make it parse-able in one shot with gazpacho:

from gazpacho import Soup
soup = Soup.get("https://www.example.com/")

Upvotes: 0

interjay
interjay

Reputation: 110069

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(htmlSource)

No file writing needed: Just pass in the HTML string. You can also pass the object returned from urlopen directly:

f = urllib.urlopen("http://SOMEWHERE") 
soup = BeautifulSoup(f)

Upvotes: 25

Related Questions