Reputation: 190679
I can get the html page using urllib, and use BeautifulSoup to parse the html page, and it looks like that I have to generate file to be read from BeautifulSoup.
import urllib
sock = urllib.urlopen("http://SOMEWHERE")
htmlSource = sock.read()
sock.close()
--> write to file
Is there a way to call BeautifulSoup without generating file from urllib?
Upvotes: 13
Views: 30065
Reputation: 10528
You could open the url, download the html, and make it parse-able in one shot with gazpacho:
from gazpacho import Soup
soup = Soup.get("https://www.example.com/")
Upvotes: 0
Reputation: 110069
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(htmlSource)
No file writing needed: Just pass in the HTML string. You can also pass the object returned from urlopen
directly:
f = urllib.urlopen("http://SOMEWHERE")
soup = BeautifulSoup(f)
Upvotes: 25