Reputation: 128
I am using the following code from this tutorial (http://jeriwieringa.com/blog/2012/11/04/beautiful-soup-tutorial-part-1/).
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
trs = soup.find_all('tr')
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
tds = tr.find_all("td")
try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
except:
print "bad tr string"
continue #This tells the computer to move on to the next item after it encounters an error
print names, years, positions, parties, states, congress
However, I get an error saying that 'continue' is not properly in the loop on line 27. I am using notepad++ and windows powershell. How do I make this code work?
Upvotes: 2
Views: 333
Reputation: 2399
White space has significance in python.
This is where things go downhill:
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
You should start, and continue, to indent the code with the appropriate number of tabs, for as long as you intend to loop.
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
Upvotes: 1
Reputation: 53
My answer maybe this simple, but it really is not on a loop, it must be on a loop the same way break works on conditionals and loops. Maybe your indentation is off, it is a big MUST and really important in python.
Upvotes: 0
Reputation: 17076
You have to indent your code another indentation level (ie 4 spaces/1 tab) beyond the indentation of the for loop. The try/except isn't I'm the for loop which is why you get the continue error.
Indentation shows where blocks go together (a for loop starts a new block and you need to indent underneath that)
Upvotes: 0
Reputation: 13699
Looks like your indentation is off, try this.
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
trs = soup.find_all('tr')
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
tds = tr.find_all("td")
try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
except:
print "bad tr string"
continue #This tells the computer to move on to the next item after it encounters an error
print names, years, positions, parties, states, congress
Upvotes: 1
Reputation: 304137
Everything from print fulllink
down is outside the for
loop
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
## indented here!!!!!
print fulllink #print in terminal to verify results
tds = tr.find_all("td")
try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
except:
print "bad tr string"
continue #This tells the computer to move on to the next item after it encounters an error
print names, years, positions, parties, states, congress
Upvotes: 2