Reputation: 113
So I'm struggling to implement beautiful into my current python project, Okay so to keep this plain and simple I'll reduce the complexity of my current script.
Script without BeautifulSoup -
import urllib2
def check(self, name, proxy):
urllib2.install_opener(
urllib2.build_opener(
urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
urllib2.HTTPHandler()
)
)
req = urllib2.Request('http://example.com' ,"param=1")
try:
resp = urllib2.urlopen(req)
except:
self.insert()
try:
if 'example text' in resp.read()
print 'success'
now of course the indentation is wrong, this is just sketch up of what I have going on, as you can in simple terms I'm sending a post request to " example.com " & then if example.com contains " example text " in resp.read print success.
But what I actually want is to check
if ' example ' in resp.read()
then output text inside td align from example.com request using
soup.find_all('td', {'align':'right'})[4]
Now the way I'm implementing beautifulsoup isn't working, example of this -
import urllib2
from bs4 import BeautifulSoup as soup
main_div = soup.find_all('td', {'align':'right'})[4]
def check(self, name, proxy):
urllib2.install_opener(
urllib2.build_opener(
urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
urllib2.HTTPHandler()
)
)
req = urllib2.Request('http://example.com' ,"param=1")
try:
resp = urllib2.urlopen(req)
web_soup = soup(urllib2.urlopen(req), 'html.parser')
except:
self.insert()
try:
if 'example text' in resp.read()
print 'success' + main_div
Now you see I added 4 new lines/adjustments
from bs4 import BeautifulSoup as soup
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = soup.find_all('td', {'align':'right'})[4]
aswell as " + main_div " on print
However it just doesn't seem to be working, I've had a few errors whilst adjusting some of which have said " Local variable referenced before assignment " & " unbound method find_all must be called with beautifulsoup instance as first argument "
Upvotes: 0
Views: 417
Reputation: 751
Regarding your last code snippet:
from bs4 import BeautifulSoup as soup
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = soup.find_all('td', {'align':'right'})[4]
You should call find_all
on the web_soup instance. Also be sure to define the url
variable before you use it:
from bs4 import BeautifulSoup as soup
url = "url to be opened"
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = web_soup.find_all('td', {'align':'right'})[4]
Upvotes: 1